DialStack Media WebSocket Protocol 2.0.0 documentation

Support: DialStack API Support
Email support: info@dialstack.ai

Real-time audio streaming protocol over WebSocket.

Overview

DialStack connects to a platform-provided WSS URL, sends a begin handshake, exchanges audio frames, and sends an end message before closing. The same wire protocol is used for every session; the enabled features depend on which REST API created it.

Features

Outbound audio

Audio from DialStack to the platform is always present. Each audio message carries a timestamp (milliseconds from session start) and a base64-encoded μ-law payload.

Inbound audio (optional)

The platform may send audio messages back to DialStack to be played to the caller. When this is not enabled, DialStack does not read from the WebSocket.

Channel tagging (optional)

Outbound audio messages may carry a channel field (caller or callee) when both sides of the call are streamed on the same connection. When the field is absent, the stream is a single mixed channel.

Session identity

The begin message carries exactly one of voice_app_id or listener_id, identifying the resource that created the session. Consumers can branch on which field is present to select per-resource behavior.

Graceful end

Before closing the WebSocket, DialStack sends an end message with a reason (call_ended, deleted, error) so the consumer can distinguish normal termination from errors.

Connection Flow

Platform provides a WSS URL when creating the session
DialStack connects and sends begin with session metadata and audio format
Audio flows via audio messages
DialStack sends end with a reason
DialStack closes the WebSocket

Audio Format

Encoding: μ-law (G.711)
Sample rate: 8000 Hz
Channels: 1 (mono)
Chunk size: ~20ms (160 bytes before base64 encoding)
Bandwidth: ~8 KB/second per channel

Connection Behavior

DialStack initiates the TLS WebSocket connection
Ping/pong keepalive every 30 seconds
If the connection drops during an active call, DialStack attempts reconnection (up to 3 times)
Closing the WebSocket stops the audio session; the effect on the underlying call depends on the session type

REST API Reference — endpoints that create sessions
Download AsyncAPI Spec — raw YAML specification

Servers
- platform
Operations

Servers

`platform` Server

URL: wss://ai.platform.example.com/voice
Protocol: wss

Platform-provided WebSocket URL, supplied when the session is created via the REST API. DialStack connects to this URL and streams audio.

Operations

SEND Begin Operation

Session start notification

Operation ID: sendBegin

Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.

Sent by DialStack immediately after the WebSocket connection is established. Contains session metadata (exactly one of voice_app_id or listener_id), optional pass-through metadata, and the audio format specification.

Message Session Begin `begin`

First message after WebSocket connection, containing session metadata

Message ID: begin
Content type: application/json

Payload

Name	Type	Description	Value	Constraints	Notes
(root)	object	Session start message with call metadata and audio format. Exactly one of `voice_app_id` or `listener_id` is present.	-	-	additional properties are allowed
event	string	Event type identifier	const (`"begin"`)	-	required
call_id	string	Unique identifier for this call	examples (`"call_01h2xcejqtf2nbrexx3vqjhp45"`)	-	required
account_id	string	Account identifier	examples (`"acct_01h2xcejqtf2nbrexx3vqjhp41"`)	-	required
audio_format	object	Audio encoding specification	-	-	required, additional properties are allowed
audio_format.encoding	string	Audio encoding (μ-law G.711)	const (`"audio/x-mulaw"`)	-	required
audio_format.sample_rate	integer	Sample rate in Hz	const (`8000`)	-	required
audio_format.channels	integer	Number of audio channels (mono)	const (`1`)	-	required
voice_app_id	string	Session identifier. Mutually exclusive with `listener_id` — exactly one of the two is present on every `begin`.	examples (`"voiceapp_01h2xcejqtf2nbrexx3vqjhp42"`)	-	-
listener_id	string	Session identifier. Mutually exclusive with `voice_app_id` — exactly one of the two is present on every `begin`.	examples (`"lstn_01h2xcejqtf2nbrexx3vqjhp50"`)	-	-

Examples of payload (generated)

{
  "event": "begin",
  "call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
  "account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
  "audio_format": {
    "encoding": "audio/x-mulaw",
    "sample_rate": 8000,
    "channels": 1
  },
  "voice_app_id": "voiceapp_01h2xcejqtf2nbrexx3vqjhp42",
  "listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50"
}

SEND Audio Operation

Audio to platform

Operation ID: sendAudio

Audio sent by DialStack to the platform. Each message carries a timestamp (milliseconds from session start). When multi-leg audio is streamed on the same connection, messages also carry a channel tag (caller or callee); otherwise the tag is omitted.

Message Audio from DialStack `audio`

*Audio sent from DialStack to the platform. May carry a per-chunk channel tag when multi-leg audio is streamed on the same connection. *

Message ID: audioFromDialStack
Content type: application/json

Payload

Name	Type	Description	Value	Constraints	Notes
(root)	object	Audio from DialStack to the platform. Carries an optional `channel` tag when multi-leg audio is streamed on the same connection.	-	-	additional properties are allowed
event	string	Event type identifier	const (`"audio"`)	-	required
timestamp	integer	Milliseconds from session start	examples (`0`, `20`, `40`)	-	required
channel	string	Which party's audio this chunk contains. Present only when multi-leg audio is streamed on the same connection; otherwise omitted (the stream is a single mixed channel).	allowed (`"caller"`, `"callee"`)	-	-
payload	string	Base64-encoded μ-law audio data (~160 bytes per 20ms chunk)	-	format (`byte`)	required

Examples of payload (generated)

{
  "event": "audio",
  "timestamp": 0,
  "channel": "caller",
  "payload": "string"
}

RECEIVE Audio Operation

Audio to caller

Operation ID: receiveAudio

Audio to play to the caller, sent by the platform to DialStack. Accepted only when the session was created with inbound audio enabled; otherwise DialStack does not read from the WebSocket.

Message Audio to DialStack `audio`

*Audio to play to the caller, sent from the platform to DialStack. Accepted only when the session was created with inbound audio enabled. *

Message ID: audioToDialStack
Content type: application/json

Payload

Name	Type	Description	Value	Constraints	Notes
(root)	object	Audio data to play to caller (Platform → DialStack). Accepted only when the session was created with inbound audio enabled.	-	-	additional properties are allowed
event	string	Event type identifier	const (`"audio"`)	-	required
payload	string	Base64-encoded μ-law audio data	-	format (`byte`)	required

Examples of payload (generated)

{
  "event": "audio",
  "payload": "string"
}

SEND End Operation

Session end

Operation ID: sendEnd

Sent by DialStack before closing the WebSocket. The reason field distinguishes a normal hangup from an explicit deletion or an error.

Message Session End `end`

Sent before DialStack closes the WebSocket, with a reason

Message ID: end
Content type: application/json

Payload

Name	Type	Description	Value	Constraints	Notes
(root)	object	Sent by DialStack before closing the WebSocket.	-	-	additional properties are allowed
event	string	Event type identifier	const (`"end"`)	-	required
reason	string	Why the session ended: - `call_ended`: the call was hung up - `deleted`: the session was closed via the API - `error`: an internal error occurred	allowed (`"call_ended"`, `"deleted"`, `"error"`)	-	required

Examples of payload (generated)

{
  "event": "end",
  "reason": "call_ended"
}

Overview​

Features​

Outbound audio​

Inbound audio (optional)​

Channel tagging (optional)​

Session identity​

Graceful end​

Connection Flow​

Audio Format​

Connection Behavior​

Related Documentation​

Table of Contents​

Servers​

platform Server​

Operations​

SEND Begin Operation​

Message Session Begin begin​

Payload​

SEND Audio Operation​

Message Audio from DialStack audio​

Payload​

RECEIVE Audio Operation​

Message Audio to DialStack audio​

Payload​

SEND End Operation​

Message Session End end​

Payload​

Overview

Features

Outbound audio

Inbound audio (optional)

Channel tagging (optional)

Session identity

Graceful end

Connection Flow

Audio Format

Connection Behavior

Related Documentation

Table of Contents

Servers

`platform` Server

Operations

SEND Begin Operation

Message Session Begin `begin`

Payload

SEND Audio Operation

Message Audio from DialStack `audio`

Payload

RECEIVE Audio Operation

Message Audio to DialStack `audio`

Payload

SEND End Operation

Message Session End `end`

Payload