Skip to main content

Voice Apps

Build programmable voice applications with webhooks and real-time audio streaming.

Overview

Voice apps let you handle calls programmatically. DialStack notifies your server via webhook, and you decide what happens next. Voice apps support two modes:

Call Control — Your server takes ownership of the call. Connect bidirectional audio for AI voice assistants, transfer calls to extensions, or build IVR systems.

Call Listening — Stream real-time audio from calls without affecting them. Use this for live monitoring, real-time transcription, or analytics.

Both modes start with a webhook notification to your server. The webhook's event field tells you which mode triggered it.

Installation

Install the DialStack SDK for Node.js:

npm install @dialstack/sdk

Initialize the client with your API key:

import { DialStack } from '@dialstack/sdk/server';

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);

Creating a Voice App

const voiceApp = await dialstack.voiceApps.create(
{ name: 'AI Receptionist', url: 'https://your-server.example.com/voice/webhook' },
{ dialstackAccount: 'acct_01h2xcejqtf2nbrexx3vqjhp41' }
);

Response:

{
"id": "va_01h2xcejqtf2nbrexx3vqjhp49",
"name": "AI Receptionist",
"url": "https://your-server.example.com/voice/webhook",
"status": "active",
"secret": "whsec_abc123def456...",
"created_at": "2025-10-18T10:00:00Z",
"updated_at": "2025-10-18T10:00:00Z"
}

Important: Save the secret value - you'll need it to verify webhook signatures.

Webhook Notifications

When a call reaches your voice app, DialStack sends an HTTP POST to your webhook URL. The same voice app can receive both event types — the event field tells you which one.

Webhook Events

EventDescriptionTrigger
call.receivedA call has been routed to this voice app for handlingVoice app is the call destination (extension or dial plan)
call.notifyA call is passing through a Voice App (Notify) node in a dial planVoice App (Notify) node in a dial plan references this voice app

For call.received, your server takes control of the call — use the Update Call API to attach audio, transfer, etc. For call.notify, the call continues routing normally — use the Listeners API to stream audio if desired.

Webhook Payload

POST /voice/webhook HTTP/1.1
Host: your-server.example.com
Content-Type: application/json
X-DialStack-Signature: t=1697634600,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd

{
"event": "call.received",
"call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
"account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
"voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
"from_number": "+14155551234",
"from_name": "John Smith",
"to_number": "+14155559876"
}

Both call.received and call.notify use the same payload shape. The event field is the only difference.

Verifying Signatures

Verify webhook signatures using the voice app's secret to ensure requests are from DialStack:

const event = dialstack.webhooks.constructEvent(
req.rawBody,
req.headers['x-dialstack-signature'],
process.env.VOICE_APP_SECRET
);

// event contains: event, call_id, account_id, voice_app_id, from_number, from_name, to_number

Webhook Response

Return 200 OK to acknowledge receipt. The response body is ignored.

app.post('/voice/webhook', (req, res) => {
const { event, call_id } = req.body;

// Acknowledge immediately
res.sendStatus(200);

// Handle based on event type
if (event === 'call.received') {
handleCallControl(call_id);
} else if (event === 'call.notify') {
handleCallNotify(call_id);
}
});

Voice App Dial Plan Nodes

Voice apps can be used in dial plans in two modes, selected by the mode field on the voice_app node:

  • Control mode — shipping today. Appears as the Voice App node in the editor.
  • Notify mode — planned (see Coming soon below). In the target design, the editor will split these into two distinct palette entries (Voice App (Control) and Voice App (Notify)); today there is a single Voice App node that operates in control mode.

Voice App (Control)

Routes the call to the voice app. Your server receives a call.received webhook and takes ownership of the call — attaching audio, transferring, etc.

{
"id": "ai_receptionist",
"type": "voice_app",
"config": {
"voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
"mode": "control",
"next": "voicemail"
}
}

Voice App (Notify)

Coming soon

Voice App (Notify) mode is currently undergoing implementation and will be available shortly. The surface documented below reflects the target design; specifics may change before release.

Sends a fire-and-forget notification to the voice app as the call passes through, without interrupting call routing. This is useful for triggering external actions (real-time transcription, call analytics, CRM logging) alongside normal call handling.

┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌─────────┐
│ Schedule │────▶│ Voice App │────▶│ Dial │────▶│Voicemail│
│ Node │ │ (Notify) │ │ User │ │ │
└──────────┘ └──────┬───────┘ └──────────┘ └─────────┘

│ POST (fire-and-forget)

┌─────────────┐
│ Your Server │
└─────────────┘

The Voice App (Notify) node:

  • Sends an HTTP POST to the voice app's URL with "event": "call.notify"
  • Immediately continues to the next node — it does not wait for a response
  • Does not answer or interrupt the call
  • Uses the same signature verification as call.received webhooks
{
"id": "notify_transcription",
"type": "voice_app",
"config": {
"voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
"mode": "notify",
"next": "dial_reception"
}
}

Call Control

When your voice app receives a call.received webhook, your server takes ownership of the call and controls it via the Update Call API.

┌─────────┐ ┌───────────┐ ┌─────────────┐
│ Caller │ │ DialStack │ │ Your Server │
└────┬────┘ └─────┬─────┘ └──────┬──────┘
│ │ │
│ 1. Call arrives │ │
│────────────────────────▶│ │
│ │ │
│ │ 2. Webhook POST │
│ │─────────────────────────▶│
│ │ │
│ │ 3. POST /v1/calls/{id} │
│ │◀─────────────────────────│
│ │ (attach audio) │
│ │ │
│ 4. Bidirectional audio │ │
│◀───────────────────────▶│◀────────────────────────▶│
│ (WebSocket) │ (WebSocket) │
│ │ │

Actions

Use the Update Call API to send actions. Actions are processed sequentially.

Attach Audio Stream

Connect bidirectional audio to your WebSocket server (see WebSocket API for the message protocol):

await dialstack.calls.update(
callId,
{ actions: [{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' }] },
{ dialstackAccount: accountId }
);

The attach action blocks until the WebSocket disconnects, then processing continues with the next action.

Transfer to Extension

Transfer the caller to an extension:

await dialstack.calls.update(
callId,
{ actions: [{ type: 'transfer', extension: '100' }] },
{ dialstackAccount: accountId }
);

If the transfer target answers or the caller hangs up, processing stops. If the transfer fails (no answer, busy), processing continues with the next action.

Combining Actions

Chain actions for fallback behavior:

{
"actions": [
{ "type": "attach", "url": "wss://ai.example.com/voice" },
{ "type": "transfer", "extension": "100" }
]
}

This connects to your AI voice assistant first. When the WebSocket disconnects (e.g., AI hands off), the call transfers to extension 100.

Replacing Actions

Sending a new update replaces all pending actions immediately. The current action is interrupted, and processing starts from the first action in the new list.

// AI decides to transfer the call
await dialstack.calls.update(
callId,
{ actions: [{ type: 'transfer', extension: '100' }] },
{ dialstackAccount: accountId }
);

WebSocket Audio Streaming

When DialStack executes an attach action, it connects to your WebSocket URL and streams audio bidirectionally. For the complete protocol specification, see the WebSocket API reference.

Audio Format

PropertyValue
Encodingμ-law (G.711)
Sample rate8000 Hz
Channels1 (mono)
Chunk size~20ms (160 bytes before base64)
Bandwidth~8 KB/second

Messages from DialStack

Begin — Sent when connection is established:

{
"event": "begin",
"call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
"account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
"audio_format": {
"encoding": "audio/x-mulaw",
"sample_rate": 8000,
"channels": 1
}
}

Audio — Caller's audio (sent continuously):

{
"event": "audio",
"timestamp": 1234,
"payload": "base64-encoded-mulaw-audio"
}

Messages to DialStack

Audio — Audio to play to the caller:

{
"event": "audio",
"payload": "base64-encoded-mulaw-audio"
}

Ending the Session

Either side can close the WebSocket to end the audio session. When closed, DialStack continues processing with the next action (if any).

Using MediaStream (SDK)

The SDK provides a MediaStream class that handles WebSocket message parsing and provides a clean event-based API:

import { MediaStream } from '@dialstack/sdk/server';
import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws) => {
const stream = new MediaStream(ws);

stream.on('begin', (event) => {
console.log('Call started:', event.call_id);
console.log('Audio format:', event.audio_format);

// Send greeting audio
stream.sendAudio(greetingAudioBase64);
});

stream.on('audio', (event) => {
// event.payload contains base64-encoded μ-law audio
// event.timestamp contains the audio timestamp

// Process with your AI pipeline and respond
const responseAudio = processAudio(event.payload);
stream.sendAudio(responseAudio);

// Or send raw Buffer (auto base64-encoded)
stream.sendAudioBuffer(audioBuffer);
});

stream.on('close', (event) => {
console.log('Call ended:', event.code, event.reason);
});

stream.on('error', (event) => {
console.error('Stream error:', event.error);
});
});

Complete Example: AI Voice Assistant

import express from 'express';
import { WebSocketServer } from 'ws';
import { DialStack, MediaStream } from '@dialstack/sdk/server';

const app = express();
app.use(
express.json({
verify: (req, res, buf) => {
req.rawBody = buf;
},
})
);

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);
const VOICE_APP_SECRET = process.env.VOICE_APP_SECRET;

// Webhook endpoint
app.post('/voice/webhook', async (req, res) => {
let event;
try {
event = dialstack.webhooks.constructEvent(
req.rawBody,
req.headers['x-dialstack-signature'],
VOICE_APP_SECRET
);
} catch (err) {
return res.sendStatus(401);
}

const { call_id, account_id, from_number, from_name } = event;
console.log(`Incoming call from ${from_name || from_number}`);

res.sendStatus(200);

// Attach audio stream with fallback transfer
await dialstack.calls.update(
call_id,
{
actions: [
{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' },
{ type: 'transfer', extension: '100' },
],
},
{ dialstackAccount: account_id }
);
});

// WebSocket server for audio streaming
const wss = new WebSocketServer({ noServer: true });

wss.on('connection', (ws) => {
const stream = new MediaStream(ws);

stream.on('begin', (event) => {
console.log(`Audio stream started for call ${stream.callId}`);
stream.sendAudio(generateGreetingAudio());
});

stream.on('audio', (event) => {
const audioBuffer = Buffer.from(event.payload, 'base64');
processAudioWithAI(audioBuffer, (responseAudio) => {
stream.sendAudio(responseAudio);
});
});

stream.on('close', () => {
console.log(`Audio stream ended for call ${stream.callId}`);
});
});

const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
if (request.url === '/voice/stream') {
wss.handleUpgrade(request, socket, head, (ws) => {
wss.emit('connection', ws, request);
});
} else {
socket.destroy();
}
});

Listeners

When your voice app receives a call.notify webhook, you can create a listener to stream real-time audio from the call. Audio flows one way only — from DialStack to your server; the listener is passive and does not inject audio or alter the call. Neither party hears a tone or indication that a listener is attached, so you are responsible for obtaining appropriate consent from the parties on the call in accordance with applicable law (recording-consent requirements vary by jurisdiction).

┌────────┐ ┌───────────┐ ┌─────────────┐
│ Caller │ │ DialStack │ │ Your Server │
└───┬────┘ └─────┬─────┘ └──────┬──────┘
│ │ │
│ Normal two-party call │ │
│◀─────────────────────────▶│ │
│ │ │
│ │ 1. Webhook (call.notify) │
│ │──────────────────────────▶│
│ │ │
│ │ 2. POST /v1/calls/{id}/ │
│ │ listeners │
│ │◀──────────────────────────│
│ │ │
│ Call continues normally │ 3. Audio (one-way WSS) │
│◀─────────────────────────▶│──────────────────────────▶│
│ │ │
  1. A Voice App (Notify) node notifies your server that a call has started
  2. Your server creates a listener on that call
  3. DialStack opens a WebSocket to your server and streams audio

Creating a Listener

const listener = await dialstack.calls.createListener(
callId,
{
url: 'wss://your-server.example.com/audio',
channel: 'both',
metadata: { agent_id: 'user_123', queue: 'support' },
},
{ dialstackAccount: accountId }
);

Channel Selection

ChannelAudio received
callerAudio from the party that initiated the call
calleeAudio from the party that received the call
bothBoth channels, delivered as separate tagged messages

Stopping a Listener

Listeners stop automatically when the call ends. To stop early:

await dialstack.calls.deleteListener(callId, listenerId, {
dialstackAccount: accountId,
});

Listener WebSocket Protocol

The listener WebSocket protocol extends the voice app protocol with channel tagging and an end message. See the WebSocket API for the full specification.

Begin — Sent when connection is established:

{
"event": "begin",
"listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50",
"call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
"account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
"channel": "both",
"metadata": { "agent_id": "user_123", "queue": "support" },
"audio_format": {
"encoding": "audio/x-mulaw",
"sample_rate": 8000,
"channels": 1
}
}

The listener_id field distinguishes listener sessions from voice app sessions, allowing the same server to handle both.

Audio — Call audio, tagged by channel:

{
"event": "audio",
"channel": "caller",
"timestamp": 1234,
"payload": "base64-encoded-mulaw-audio"
}

End — Sent when the listener stops:

{
"event": "end",
"listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50",
"reason": "call_ended"
}

Reasons: call_ended, deleted (stopped via API), error.

Complete Example: Real-Time Transcription

import express from 'express';
import { WebSocketServer } from 'ws';
import { DialStack } from '@dialstack/sdk/server';

const app = express();
app.use(
express.json({
verify: (req, res, buf) => {
req.rawBody = buf;
},
})
);

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);
const VOICE_APP_SECRET = process.env.VOICE_APP_SECRET;

// Webhook endpoint — receives call.notify from Voice App (Notify) dial plan node
app.post('/voice/webhook', async (req, res) => {
let event;
try {
event = dialstack.webhooks.constructEvent(
req.rawBody,
req.headers['x-dialstack-signature'],
VOICE_APP_SECRET
);
} catch (err) {
return res.sendStatus(401);
}

res.sendStatus(200);

if (event.event === 'call.notify') {
// Create a listener to stream audio for transcription
await dialstack.calls.createListener(
event.call_id,
{
url: 'wss://your-server.example.com/audio',
channel: 'both',
metadata: { from: event.from_number, to: event.to_number },
},
{ dialstackAccount: event.account_id }
);
}
});

// WebSocket server for receiving listener audio
const wss = new WebSocketServer({ noServer: true });

wss.on('connection', (ws) => {
let listenerId;

ws.on('message', (data) => {
const message = JSON.parse(data);

switch (message.event) {
case 'begin':
listenerId = message.listener_id;
console.log(`Listening to call ${message.call_id} (${message.channel})`);
break;

case 'audio':
// Send to your speech-to-text service
transcribe(message.payload, message.channel);
break;

case 'end':
console.log(`Listener ${listenerId} stopped: ${message.reason}`);
break;
}
});
});

const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
if (request.url === '/audio') {
wss.handleUpgrade(request, socket, head, (ws) => {
wss.emit('connection', ws, request);
});
} else {
socket.destroy();
}
});

API Reference

  • Voice Apps — Create and manage voice apps
  • Update Call — Control active calls with actions
  • Listeners — Stream real-time audio from active calls
  • WebSocket API — Audio streaming protocol (voice apps and listeners)