Skip to main content

DialStack Media WebSocket Protocol 2.0.0 documentation

Real-time audio streaming protocol over WebSocket.

Overview

DialStack connects to a platform-provided WSS URL, sends a begin handshake, exchanges audio frames, and sends an end message before closing. The same wire protocol is used for every session; the enabled features depend on which REST API created it.

Features

Outbound audio

Audio from DialStack to the platform is always present. Each audio message carries a timestamp (milliseconds from session start) and a base64-encoded μ-law payload.

Inbound audio (optional)

The platform may send audio messages back to DialStack to be played to the caller. When this is not enabled, DialStack does not read from the WebSocket.

Channel tagging (optional)

Outbound audio messages may carry a channel field (caller or callee) when both sides of the call are streamed on the same connection. When the field is absent, the stream is a single mixed channel.

Session identity

The begin message carries exactly one of voice_app_id or listener_id, identifying the resource that created the session. Consumers can branch on which field is present to select per-resource behavior.

Graceful end

Before closing the WebSocket, DialStack sends an end message with a reason (call_ended, deleted, error) so the consumer can distinguish normal termination from errors.

Connection Flow

  1. Platform provides a WSS URL when creating the session
  2. DialStack connects and sends begin with session metadata and audio format
  3. Audio flows via audio messages
  4. DialStack sends end with a reason
  5. DialStack closes the WebSocket

Audio Format

  • Encoding: μ-law (G.711)
  • Sample rate: 8000 Hz
  • Channels: 1 (mono)
  • Chunk size: ~20ms (160 bytes before base64 encoding)
  • Bandwidth: ~8 KB/second per channel

Connection Behavior

  • DialStack initiates the TLS WebSocket connection
  • Ping/pong keepalive every 30 seconds
  • If the connection drops during an active call, DialStack attempts reconnection (up to 3 times)
  • Closing the WebSocket stops the audio session; the effect on the underlying call depends on the session type

Table of Contents

Servers

platform Server

  • URL: wss://ai.platform.example.com/voice
  • Protocol: wss

Platform-provided WebSocket URL, supplied when the session is created via the REST API. DialStack connects to this URL and streams audio.

Operations

SEND Begin Operation

Session start notification

  • Operation ID: sendBegin

Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.

Sent by DialStack immediately after the WebSocket connection is established. Contains session metadata (exactly one of voice_app_id or listener_id), optional pass-through metadata, and the audio format specification.

Message Session Begin begin

First message after WebSocket connection, containing session metadata

Payload
NameTypeDescriptionValueConstraintsNotes
(root)objectSession start message with call metadata and audio format. Exactly one of voice_app_id or listener_id is present.--additional properties are allowed
eventstringEvent type identifierconst ("begin")-required
call_idstringUnique identifier for this callexamples ("call_01h2xcejqtf2nbrexx3vqjhp45")-required
account_idstringAccount identifierexamples ("acct_01h2xcejqtf2nbrexx3vqjhp41")-required
audio_formatobjectAudio encoding specification--required, additional properties are allowed
audio_format.encodingstringAudio encoding (μ-law G.711)const ("audio/x-mulaw")-required
audio_format.sample_rateintegerSample rate in Hzconst (8000)-required
audio_format.channelsintegerNumber of audio channels (mono)const (1)-required
voice_app_idstringSession identifier. Mutually exclusive with listener_id — exactly one of the two is present on every begin.examples ("voiceapp_01h2xcejqtf2nbrexx3vqjhp42")--
listener_idstringSession identifier. Mutually exclusive with voice_app_id — exactly one of the two is present on every begin.examples ("lstn_01h2xcejqtf2nbrexx3vqjhp50")--

Examples of payload (generated)

{
"event": "begin",
"call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
"account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
"audio_format": {
"encoding": "audio/x-mulaw",
"sample_rate": 8000,
"channels": 1
},
"voice_app_id": "voiceapp_01h2xcejqtf2nbrexx3vqjhp42",
"listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50"
}

SEND Audio Operation

Audio to platform

  • Operation ID: sendAudio

Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.

Audio sent by DialStack to the platform. Each message carries a timestamp (milliseconds from session start). When multi-leg audio is streamed on the same connection, messages also carry a channel tag (caller or callee); otherwise the tag is omitted.

Message Audio from DialStack audio

*Audio sent from DialStack to the platform. May carry a per-chunk channel tag when multi-leg audio is streamed on the same connection. *

Payload
NameTypeDescriptionValueConstraintsNotes
(root)objectAudio from DialStack to the platform. Carries an optional channel tag when multi-leg audio is streamed on the same connection.--additional properties are allowed
eventstringEvent type identifierconst ("audio")-required
timestampintegerMilliseconds from session startexamples (0, 20, 40)-required
channelstringWhich party's audio this chunk contains. Present only when multi-leg audio is streamed on the same connection; otherwise omitted (the stream is a single mixed channel).allowed ("caller", "callee")--
payloadstringBase64-encoded μ-law audio data (~160 bytes per 20ms chunk)-format (byte)required

Examples of payload (generated)

{
"event": "audio",
"timestamp": 0,
"channel": "caller",
"payload": "string"
}

RECEIVE Audio Operation

Audio to caller

  • Operation ID: receiveAudio

Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.

Audio to play to the caller, sent by the platform to DialStack. Accepted only when the session was created with inbound audio enabled; otherwise DialStack does not read from the WebSocket.

Message Audio to DialStack audio

*Audio to play to the caller, sent from the platform to DialStack. Accepted only when the session was created with inbound audio enabled. *

Payload
NameTypeDescriptionValueConstraintsNotes
(root)objectAudio data to play to caller (Platform → DialStack). Accepted only when the session was created with inbound audio enabled.--additional properties are allowed
eventstringEvent type identifierconst ("audio")-required
payloadstringBase64-encoded μ-law audio data-format (byte)required

Examples of payload (generated)

{
"event": "audio",
"payload": "string"
}

SEND End Operation

Session end

  • Operation ID: sendEnd

Media stream channel. DialStack connects to the platform's WebSocket URL and exchanges JSON messages for session control and audio data. Outbound audio is always streamed; inbound audio is accepted when the session was created with that feature enabled.

Sent by DialStack before closing the WebSocket. The reason field distinguishes a normal hangup from an explicit deletion or an error.

Message Session End end

Sent before DialStack closes the WebSocket, with a reason

Payload
NameTypeDescriptionValueConstraintsNotes
(root)objectSent by DialStack before closing the WebSocket.--additional properties are allowed
eventstringEvent type identifierconst ("end")-required
reasonstringWhy the session ended: - call_ended: the call was hung up - deleted: the session was closed via the API - error: an internal error occurredallowed ("call_ended", "deleted", "error")-required

Examples of payload (generated)

{
"event": "end",
"reason": "call_ended"
}