DialStack Media WebSocket Protocol 2.0.0 documentation
- Support: DialStack API Support
- Email support: info@dialstack.ai
Real-time audio streaming protocol over WebSocket.
Overview
DialStack connects to a platform-provided WSS URL, sends a begin
handshake, exchanges audio frames, and sends an end message before
closing. The same wire protocol is used for every session; the enabled
features depend on which REST API created it.
Features
Outbound audio
Audio from DialStack to the platform is always present. Each audio
message carries a timestamp (milliseconds from session start) and a
base64-encoded μ-law payload.
Inbound audio (optional)
The platform may send audio messages back to DialStack to be played to
the caller. When this is not enabled, DialStack does not read from the
WebSocket.
Channel tagging (optional)
Outbound audio messages may carry a channel field (caller or
callee) when both sides of the call are streamed on the same
connection. When the field is absent, the stream is a single mixed
channel.
Session identity
The begin message carries exactly one of voice_app_id or
listener_id, identifying the resource that created the session.
Consumers can branch on which field is present to select per-resource
behavior.
Graceful end
Before closing the WebSocket, DialStack sends an end message with a
reason (call_ended, deleted, error) so the consumer can
distinguish normal termination from errors.
Connection Flow
- Platform provides a WSS URL when creating the session
- DialStack connects and sends
beginwith session metadata and audio format - Audio flows via
audiomessages - DialStack sends
endwith a reason - DialStack closes the WebSocket
Audio Format
- Encoding: μ-law (G.711)
- Sample rate: 8000 Hz
- Channels: 1 (mono)
- Chunk size: ~20ms (160 bytes before base64 encoding)
- Bandwidth: ~8 KB/second per channel
Connection Behavior
- DialStack initiates the TLS WebSocket connection
- Ping/pong keepalive every 30 seconds
- If the connection drops during an active call, DialStack attempts reconnection (up to 3 times)
- Closing the WebSocket stops the audio session; the effect on the underlying call depends on the session type
Related Documentation
- REST API Reference — endpoints that create sessions
- Download AsyncAPI Spec — raw YAML specification
Table of Contents
Servers
platform Server
- URL:
wss://ai.platform.example.com/voice - Protocol:
wss
Platform-provided WebSocket URL, supplied when the session is created via the REST API. DialStack connects to this URL and streams audio.
Operations
SEND Begin Operation
Session start notification
- Operation ID:
sendBegin
Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.
Sent by DialStack immediately after the WebSocket connection is
established. Contains session metadata (exactly one of voice_app_id
or listener_id), optional pass-through metadata, and the audio
format specification.
Message Session Begin begin
First message after WebSocket connection, containing session metadata
- Message ID:
begin - Content type: application/json
Payload
| Name | Type | Description | Value | Constraints | Notes |
|---|---|---|---|---|---|
| (root) | object | Session start message with call metadata and audio format. Exactly one of voice_app_id or listener_id is present. | - | - | additional properties are allowed |
| event | string | Event type identifier | const ("begin") | - | required |
| call_id | string | Unique identifier for this call | examples ("call_01h2xcejqtf2nbrexx3vqjhp45") | - | required |
| account_id | string | Account identifier | examples ("acct_01h2xcejqtf2nbrexx3vqjhp41") | - | required |
| audio_format | object | Audio encoding specification | - | - | required, additional properties are allowed |
| audio_format.encoding | string | Audio encoding (μ-law G.711) | const ("audio/x-mulaw") | - | required |
| audio_format.sample_rate | integer | Sample rate in Hz | const (8000) | - | required |
| audio_format.channels | integer | Number of audio channels (mono) | const (1) | - | required |
| voice_app_id | string | Session identifier. Mutually exclusive with listener_id — exactly one of the two is present on every begin. | examples ("voiceapp_01h2xcejqtf2nbrexx3vqjhp42") | - | - |
| listener_id | string | Session identifier. Mutually exclusive with voice_app_id — exactly one of the two is present on every begin. | examples ("lstn_01h2xcejqtf2nbrexx3vqjhp50") | - | - |
Examples of payload (generated)
{
"event": "begin",
"call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
"account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
"audio_format": {
"encoding": "audio/x-mulaw",
"sample_rate": 8000,
"channels": 1
},
"voice_app_id": "voiceapp_01h2xcejqtf2nbrexx3vqjhp42",
"listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50"
}
SEND Audio Operation
Audio to platform
- Operation ID:
sendAudio
Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.
Audio sent by DialStack to the platform. Each message carries a
timestamp (milliseconds from session start). When multi-leg audio is
streamed on the same connection, messages also carry a channel tag
(caller or callee); otherwise the tag is omitted.
Message Audio from DialStack audio
*Audio sent from DialStack to the platform. May carry a per-chunk
channel tag when multi-leg audio is streamed on the same connection.
*
- Message ID:
audioFromDialStack - Content type: application/json
Payload
| Name | Type | Description | Value | Constraints | Notes |
|---|---|---|---|---|---|
| (root) | object | Audio from DialStack to the platform. Carries an optional channel tag when multi-leg audio is streamed on the same connection. | - | - | additional properties are allowed |
| event | string | Event type identifier | const ("audio") | - | required |
| timestamp | integer | Milliseconds from session start | examples (0, 20, 40) | - | required |
| channel | string | Which party's audio this chunk contains. Present only when multi-leg audio is streamed on the same connection; otherwise omitted (the stream is a single mixed channel). | allowed ("caller", "callee") | - | - |
| payload | string | Base64-encoded μ-law audio data (~160 bytes per 20ms chunk) | - | format (byte) | required |
Examples of payload (generated)
{
"event": "audio",
"timestamp": 0,
"channel": "caller",
"payload": "string"
}
RECEIVE Audio Operation
Audio to caller
- Operation ID:
receiveAudio
Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.
Audio to play to the caller, sent by the platform to DialStack. Accepted only when the session was created with inbound audio enabled; otherwise DialStack does not read from the WebSocket.
Message Audio to DialStack audio
*Audio to play to the caller, sent from the platform to DialStack. Accepted only when the session was created with inbound audio enabled. *
- Message ID:
audioToDialStack - Content type: application/json
Payload
| Name | Type | Description | Value | Constraints | Notes |
|---|---|---|---|---|---|
| (root) | object | Audio data to play to caller (Platform → DialStack). Accepted only when the session was created with inbound audio enabled. | - | - | additional properties are allowed |
| event | string | Event type identifier | const ("audio") | - | required |
| payload | string | Base64-encoded μ-law audio data | - | format (byte) | required |
Examples of payload (generated)
{
"event": "audio",
"payload": "string"
}
SEND End Operation
Session end
- Operation ID:
sendEnd
Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.
Sent by DialStack before closing the WebSocket. The reason field
distinguishes a normal hangup from an explicit deletion or an error.
Message Session End end
Sent before DialStack closes the WebSocket, with a reason
- Message ID:
end - Content type: application/json
Payload
| Name | Type | Description | Value | Constraints | Notes |
|---|---|---|---|---|---|
| (root) | object | Sent by DialStack before closing the WebSocket. | - | - | additional properties are allowed |
| event | string | Event type identifier | const ("end") | - | required |
| reason | string | Why the session ended: - call_ended: the call was hung up - deleted: the session was closed via the API - error: an internal error occurred | allowed ("call_ended", "deleted", "error") | - | required |
Examples of payload (generated)
{
"event": "end",
"reason": "call_ended"
}