Voice Apps

Build programmable voice applications with webhooks and real-time audio streaming.

Overview

Voice apps let you handle calls programmatically. DialStack notifies your server via webhook, and you decide what happens next. Voice apps support two modes:

Call Control — Your server takes ownership of the call. Connect bidirectional audio for AI voice assistants, transfer calls to extensions, or build IVR systems.

Call Listening — Stream real-time audio from calls without affecting them. Use this for live monitoring, real-time transcription, or analytics.

Both modes start with a webhook notification to your server. The webhook's event field tells you which mode triggered it.

Installation

Install the DialStack SDK for Node.js:

npm install @dialstack/sdk

Initialize the client with your API key:

import { DialStack } from '@dialstack/sdk/server';

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);

Creating a Voice App

SDK
cURL

const voiceApp = await dialstack.voiceApps.create(
  { name: 'AI Receptionist', url: 'https://your-server.example.com/voice/webhook' },
  { dialstackAccount: 'acct_01h2xcejqtf2nbrexx3vqjhp41' }
);

curl -X POST https://api.dialstack.ai/v1/voice-apps \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "AI Receptionist",
    "url": "https://your-server.example.com/voice/webhook"
  }'

Response:

{
  "id": "va_01h2xcejqtf2nbrexx3vqjhp49",
  "name": "AI Receptionist",
  "url": "https://your-server.example.com/voice/webhook",
  "status": "active",
  "secret": "whsec_abc123def456...",
  "created_at": "2025-10-18T10:00:00Z",
  "updated_at": "2025-10-18T10:00:00Z"
}

Important: Save the secret value - you'll need it to verify webhook signatures.

Webhook Notifications

When a call reaches your voice app, DialStack sends an HTTP POST to your webhook URL. The same voice app can receive both event types — the event field tells you which one.

Webhook Events

Event	Description	Trigger
`call.received`	A call has been routed to this voice app for handling	Voice app is the call destination (extension or dial plan)
`call.notify`	A call is passing through a Voice App (Notify) node in a dial plan	Voice App (Notify) node in a dial plan references this voice app

For call.received, your server takes control of the call — use the Update Call API to attach audio, transfer, etc. For call.notify, the call continues routing normally — use the Listeners API to stream audio if desired.

Webhook Payload

POST /voice/webhook HTTP/1.1
Host: your-server.example.com
Content-Type: application/json
X-DialStack-Signature: t=1697634600,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd

{
  "event": "call.received",
  "call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
  "account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
  "voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
  "from_number": "+14155551234",
  "from_name": "John Smith",
  "to_number": "+14155559876"
}

Both call.received and call.notify use the same payload shape. The event field is the only difference.

Verifying Signatures

Verify webhook signatures using the voice app's secret to ensure requests are from DialStack:

SDK
Manual

const event = dialstack.webhooks.constructEvent(
  req.rawBody,
  req.headers['x-dialstack-signature'],
  process.env.VOICE_APP_SECRET
);

// event contains: event, call_id, account_id, voice_app_id, from_number, from_name, to_number

const crypto = require('crypto');

function verifySignature(payload, signature, secret) {
  const [tPart, v1Part] = signature.split(',');
  const timestamp = tPart.split('=')[1];
  const expectedSig = v1Part.split('=')[1];

  // Reject old timestamps (replay protection)
  const age = Date.now() / 1000 - parseInt(timestamp);
  if (age > 300) {
    // 5 minutes
    return false;
  }

  // Compute expected signature
  const signedPayload = `${timestamp}.${payload}`;
  const computedSig = crypto.createHmac('sha256', secret).update(signedPayload).digest('hex');

  // Constant-time comparison
  return crypto.timingSafeEqual(Buffer.from(expectedSig), Buffer.from(computedSig));
}

Webhook Response

Return 200 OK to acknowledge receipt. The response body is ignored.

app.post('/voice/webhook', (req, res) => {
  const { event, call_id } = req.body;

  // Acknowledge immediately
  res.sendStatus(200);

  // Handle based on event type
  if (event === 'call.received') {
    handleCallControl(call_id);
  } else if (event === 'call.notify') {
    handleCallNotify(call_id);
  }
});

Voice App Dial Plan Nodes

Voice apps can be used in dial plans in two modes, selected by the mode field on the voice_app node:

Control mode — shipping today. Appears as the Voice App node in the editor.
Notify mode — planned (see Coming soon below). In the target design, the editor will split these into two distinct palette entries (Voice App (Control) and Voice App (Notify)); today there is a single Voice App node that operates in control mode.

Voice App (Control)

Routes the call to the voice app. Your server receives a call.received webhook and takes ownership of the call — attaching audio, transferring, etc.

{
  "id": "ai_receptionist",
  "type": "voice_app",
  "config": {
    "voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
    "mode": "control",
    "next": "voicemail"
  }
}

Voice App (Notify)

Coming soon

Voice App (Notify) mode is currently undergoing implementation and will be available shortly. The surface documented below reflects the target design; specifics may change before release.

Sends a fire-and-forget notification to the voice app as the call passes through, without interrupting call routing. This is useful for triggering external actions (real-time transcription, call analytics, CRM logging) alongside normal call handling.

┌──────────┐     ┌──────────────┐     ┌──────────┐     ┌─────────┐
│ Schedule │────▶│  Voice App   │────▶│   Dial   │────▶│Voicemail│
│  Node    │     │  (Notify)    │     │   User   │     │         │
└──────────┘     └──────┬───────┘     └──────────┘     └─────────┘
                        │
                        │ POST (fire-and-forget)
                        ▼
                  ┌─────────────┐
                  │ Your Server │
                  └─────────────┘

The Voice App (Notify) node:

Sends an HTTP POST to the voice app's URL with "event": "call.notify"
Immediately continues to the next node — it does not wait for a response
Does not answer or interrupt the call
Uses the same signature verification as call.received webhooks

{
  "id": "notify_transcription",
  "type": "voice_app",
  "config": {
    "voice_app_id": "va_01h2xcejqtf2nbrexx3vqjhp49",
    "mode": "notify",
    "next": "dial_reception"
  }
}

Call Control

When your voice app receives a call.received webhook, your server takes ownership of the call and controls it via the Update Call API.

┌─────────┐              ┌───────────┐             ┌─────────────┐
│  Caller │              │ DialStack │             │ Your Server │
└────┬────┘              └─────┬─────┘             └──────┬──────┘
     │                         │                          │
     │  1. Call arrives        │                          │
     │────────────────────────▶│                          │
     │                         │                          │
     │                         │  2. Webhook POST         │
     │                         │─────────────────────────▶│
     │                         │                          │
     │                         │  3. POST /v1/calls/{id}  │
     │                         │◀─────────────────────────│
     │                         │     (attach audio)       │
     │                         │                          │
     │  4. Bidirectional audio │                          │
     │◀───────────────────────▶│◀────────────────────────▶│
     │       (WebSocket)       │       (WebSocket)        │
     │                         │                          │

Actions

Use the Update Call API to send actions. Actions are processed sequentially.

Attach Audio Stream

Connect bidirectional audio to your WebSocket server (see WebSocket API for the message protocol):

SDK
cURL

await dialstack.calls.update(
  callId,
  { actions: [{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' }] },
  { dialstackAccount: accountId }
);

curl -X POST https://api.dialstack.ai/v1/calls/call_01h2xcejqtf2nbrexx3vqjhp45 \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41" \
  -H "Content-Type: application/json" \
  -d '{
    "actions": [
      {"type": "attach", "url": "wss://your-server.example.com/voice/stream"}
    ]
  }'

The attach action blocks until the WebSocket disconnects, then processing continues with the next action.

Transfer

Transfer the caller to an extension, an E.164 phone number, or a SIP address:

SDK
cURL

await dialstack.calls.update(
  callId,
  { actions: [{ type: 'transfer', target: '100' }] },
  { dialstackAccount: accountId }
);

curl -X POST https://api.dialstack.ai/v1/calls/call_01h2xcejqtf2nbrexx3vqjhp45 \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41" \
  -H "Content-Type: application/json" \
  -d '{
    "actions": [
      {"type": "transfer", "target": "100"}
    ]
  }'

If the transfer target answers or the caller hangs up, processing stops. If the transfer fails (no answer, busy), processing continues with the next action.

Transfer to a SIP address (e.g. an AI voice agent)

The target can also be a sip: URI, which routes the live call to an external SIP endpoint such as an AI voice agent. Your server provides the full address — for example, after registering the call with your agent provider and receiving an identifier to dial:

await dialstack.calls.update(
  callId,
  { actions: [{ type: 'transfer', target: 'sip:agent-7f3c@sip.example-ai.com' }] },
  { dialstackAccount: accountId }
);

The audio is connected over plain RTP. You can append ;transport=tcp or ;transport=udp to the URI to select the signaling transport.

Combining Actions

Chain actions for fallback behavior:

{
  "actions": [
    { "type": "attach", "url": "wss://ai.example.com/voice" },
    { "type": "transfer", "target": "100" }
  ]
}

This connects to your AI voice assistant first. When the WebSocket disconnects (e.g., AI hands off), the call transfers to extension 100.

Replacing Actions

Sending a new update replaces all pending actions immediately. The current action is interrupted, and processing starts from the first action in the new list.

SDK
fetch

// AI decides to transfer the call
await dialstack.calls.update(
  callId,
  { actions: [{ type: 'transfer', target: '100' }] },
  { dialstackAccount: accountId }
);

// AI decides to transfer the call
await fetch(`https://api.dialstack.ai/v1/calls/${callId}`, {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${apiKey}`,
    'DialStack-Account': accountId,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    actions: [{ type: 'transfer', target: '100' }],
  }),
});

WebSocket Audio Streaming

When DialStack executes an attach action, it connects to your WebSocket URL and streams audio bidirectionally. For the complete protocol specification, see the WebSocket API reference.

Audio Format

Property	Value
Encoding	μ-law (G.711)
Sample rate	8000 Hz
Channels	1 (mono)
Chunk size	~20ms (160 bytes before base64)
Bandwidth	~8 KB/second

Messages from DialStack

Begin — Sent when connection is established:

{
  "event": "begin",
  "call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
  "account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
  "audio_format": {
    "encoding": "audio/x-mulaw",
    "sample_rate": 8000,
    "channels": 1
  }
}

Audio — Caller's audio (sent continuously):

{
  "event": "audio",
  "timestamp": 1234,
  "payload": "base64-encoded-mulaw-audio"
}

Messages to DialStack

Audio — Audio to play to the caller:

{
  "event": "audio",
  "payload": "base64-encoded-mulaw-audio"
}

Ending the Session

Either side can close the WebSocket to end the audio session. When closed, DialStack continues processing with the next action (if any).

Using MediaStream (SDK)

The SDK provides a MediaStream class that handles WebSocket message parsing and provides a clean event-based API:

import { MediaStream } from '@dialstack/sdk/server';
import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws) => {
  const stream = new MediaStream(ws);

  stream.on('begin', (event) => {
    console.log('Call started:', event.call_id);
    console.log('Audio format:', event.audio_format);

    // Send greeting audio
    stream.sendAudio(greetingAudioBase64);
  });

  stream.on('audio', (event) => {
    // event.payload contains base64-encoded μ-law audio
    // event.timestamp contains the audio timestamp

    // Process with your AI pipeline and respond
    const responseAudio = processAudio(event.payload);
    stream.sendAudio(responseAudio);

    // Or send raw Buffer (auto base64-encoded)
    stream.sendAudioBuffer(audioBuffer);
  });

  stream.on('close', (event) => {
    console.log('Call ended:', event.code, event.reason);
  });

  stream.on('error', (event) => {
    console.error('Stream error:', event.error);
  });
});

Complete Example: AI Voice Assistant

import express from 'express';
import { WebSocketServer } from 'ws';
import { DialStack, MediaStream } from '@dialstack/sdk/server';

const app = express();
app.use(
  express.json({
    verify: (req, res, buf) => {
      req.rawBody = buf;
    },
  })
);

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);
const VOICE_APP_SECRET = process.env.VOICE_APP_SECRET;

// Webhook endpoint
app.post('/voice/webhook', async (req, res) => {
  let event;
  try {
    event = dialstack.webhooks.constructEvent(
      req.rawBody,
      req.headers['x-dialstack-signature'],
      VOICE_APP_SECRET
    );
  } catch (err) {
    return res.sendStatus(401);
  }

  const { call_id, account_id, from_number, from_name } = event;
  console.log(`Incoming call from ${from_name || from_number}`);

  res.sendStatus(200);

  // Attach audio stream with fallback transfer
  await dialstack.calls.update(
    call_id,
    {
      actions: [
        { type: 'attach', url: 'wss://your-server.example.com/voice/stream' },
        { type: 'transfer', target: '100' },
      ],
    },
    { dialstackAccount: account_id }
  );
});

// WebSocket server for audio streaming
const wss = new WebSocketServer({ noServer: true });

wss.on('connection', (ws) => {
  const stream = new MediaStream(ws);

  stream.on('begin', (event) => {
    console.log(`Audio stream started for call ${stream.callId}`);
    stream.sendAudio(generateGreetingAudio());
  });

  stream.on('audio', (event) => {
    const audioBuffer = Buffer.from(event.payload, 'base64');
    processAudioWithAI(audioBuffer, (responseAudio) => {
      stream.sendAudio(responseAudio);
    });
  });

  stream.on('close', () => {
    console.log(`Audio stream ended for call ${stream.callId}`);
  });
});

const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
  if (request.url === '/voice/stream') {
    wss.handleUpgrade(request, socket, head, (ws) => {
      wss.emit('connection', ws, request);
    });
  } else {
    socket.destroy();
  }
});

const express = require('express');
const WebSocket = require('ws');
const crypto = require('crypto');

const app = express();
app.use(
  express.json({
    verify: (req, res, buf) => {
      req.rawBody = buf;
    },
  })
);

const API_KEY = process.env.DIALSTACK_API_KEY;
const VOICE_APP_SECRET = process.env.VOICE_APP_SECRET;
const ACCOUNT_ID = process.env.DIALSTACK_ACCOUNT_ID;

// Webhook endpoint
app.post('/voice/webhook', async (req, res) => {
  const signature = req.headers['x-dialstack-signature'];
  if (!verifySignature(req.rawBody.toString(), signature, VOICE_APP_SECRET)) {
    return res.sendStatus(401);
  }

  const { call_id, from_number, from_name } = req.body;
  console.log(`Incoming call from ${from_name || from_number}`);

  res.sendStatus(200);

  await fetch(`https://api.dialstack.ai/v1/calls/${call_id}`, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${API_KEY}`,
      'DialStack-Account': ACCOUNT_ID,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      actions: [
        { type: 'attach', url: 'wss://your-server.example.com/voice/stream' },
        { type: 'transfer', target: '100' },
      ],
    }),
  });
});

// WebSocket server for audio streaming
const wss = new WebSocket.Server({ noServer: true });

wss.on('connection', (ws) => {
  let callId;

  ws.on('message', (data) => {
    const message = JSON.parse(data);

    switch (message.event) {
      case 'begin':
        callId = message.call_id;
        console.log(`Audio stream started for call ${callId}`);
        ws.send(JSON.stringify({ event: 'audio', payload: generateGreetingAudio() }));
        break;

      case 'audio':
        const audioBuffer = Buffer.from(message.payload, 'base64');
        processAudioWithAI(audioBuffer, (responseAudio) => {
          ws.send(JSON.stringify({ event: 'audio', payload: responseAudio }));
        });
        break;
    }
  });

  ws.on('close', () => {
    console.log(`Audio stream ended for call ${callId}`);
  });
});

const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
  if (request.url === '/voice/stream') {
    wss.handleUpgrade(request, socket, head, (ws) => {
      wss.emit('connection', ws, request);
    });
  } else {
    socket.destroy();
  }
});

function verifySignature(payload, signature, secret) {
  const [tPart, v1Part] = signature.split(',');
  const timestamp = tPart.split('=')[1];
  const expectedSig = v1Part.split('=')[1];

  const age = Date.now() / 1000 - parseInt(timestamp);
  if (age > 300) return false;

  const signedPayload = `${timestamp}.${payload}`;
  const computedSig = crypto.createHmac('sha256', secret).update(signedPayload).digest('hex');

  return crypto.timingSafeEqual(Buffer.from(expectedSig), Buffer.from(computedSig));
}

Listeners

When your voice app receives a call.notify webhook, you can create a listener to stream real-time audio from the call. Audio flows one way only — from DialStack to your server; the listener is passive and does not inject audio or alter the call. Neither party hears a tone or indication that a listener is attached, so you are responsible for obtaining appropriate consent from the parties on the call in accordance with applicable law (recording-consent requirements vary by jurisdiction).

┌────────┐                ┌───────────┐              ┌─────────────┐
│ Caller │                │ DialStack │              │ Your Server │
└───┬────┘                └─────┬─────┘              └──────┬──────┘
    │                           │                           │
    │ Normal two-party call     │                           │
    │◀─────────────────────────▶│                           │
    │                           │                           │
    │                           │  1. Webhook (call.notify) │
    │                           │──────────────────────────▶│
    │                           │                           │
    │                           │  2. POST /v1/calls/{id}/  │
    │                           │     listeners             │
    │                           │◀──────────────────────────│
    │                           │                           │
    │  Call continues normally  │  3. Audio (one-way WSS)   │
    │◀─────────────────────────▶│──────────────────────────▶│
    │                           │                           │

A Voice App (Notify) node notifies your server that a call has started
Your server creates a listener on that call
DialStack opens a WebSocket to your server and streams audio

Creating a Listener

SDK
cURL

const listener = await dialstack.calls.createListener(
  callId,
  {
    url: 'wss://your-server.example.com/audio',
    channel: 'both',
    metadata: { agent_id: 'user_123', queue: 'support' },
  },
  { dialstackAccount: accountId }
);

curl -X POST https://api.dialstack.ai/v1/calls/call_01h2xcejqtf2nbrexx3vqjhp45/listeners \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "wss://your-server.example.com/audio",
    "channel": "both",
    "metadata": {"agent_id": "user_123", "queue": "support"}
  }'

Channel Selection

Channel	Audio received
`caller`	Audio from the party that initiated the call
`callee`	Audio from the party that received the call
`both`	Both channels, delivered as separate tagged messages

Stopping a Listener

Listeners stop automatically when the call ends. To stop early:

SDK
cURL

await dialstack.calls.deleteListener(callId, listenerId, {
  dialstackAccount: accountId,
});

curl -X DELETE https://api.dialstack.ai/v1/calls/call_.../listeners/lstn_... \
  -H "Authorization: Bearer sk_live_YOUR_API_KEY" \
  -H "DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41"

Listener WebSocket Protocol

The listener WebSocket protocol extends the voice app protocol with channel tagging and an end message. See the WebSocket API for the full specification.

Begin — Sent when connection is established:

{
  "event": "begin",
  "listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50",
  "call_id": "call_01h2xcejqtf2nbrexx3vqjhp45",
  "account_id": "acct_01h2xcejqtf2nbrexx3vqjhp41",
  "channel": "both",
  "metadata": { "agent_id": "user_123", "queue": "support" },
  "audio_format": {
    "encoding": "audio/x-mulaw",
    "sample_rate": 8000,
    "channels": 1
  }
}

The listener_id field distinguishes listener sessions from voice app sessions, allowing the same server to handle both.

Audio — Call audio, tagged by channel:

{
  "event": "audio",
  "channel": "caller",
  "timestamp": 1234,
  "payload": "base64-encoded-mulaw-audio"
}

End — Sent when the listener stops:

{
  "event": "end",
  "listener_id": "lstn_01h2xcejqtf2nbrexx3vqjhp50",
  "reason": "call_ended"
}

Reasons: call_ended, deleted (stopped via API), error.

Complete Example: Real-Time Transcription

import express from 'express';
import { WebSocketServer } from 'ws';
import { DialStack } from '@dialstack/sdk/server';

const app = express();
app.use(
  express.json({
    verify: (req, res, buf) => {
      req.rawBody = buf;
    },
  })
);

const dialstack = new DialStack(process.env.DIALSTACK_API_KEY);
const VOICE_APP_SECRET = process.env.VOICE_APP_SECRET;

// Webhook endpoint — receives call.notify from Voice App (Notify) dial plan node
app.post('/voice/webhook', async (req, res) => {
  let event;
  try {
    event = dialstack.webhooks.constructEvent(
      req.rawBody,
      req.headers['x-dialstack-signature'],
      VOICE_APP_SECRET
    );
  } catch (err) {
    return res.sendStatus(401);
  }

  res.sendStatus(200);

  if (event.event === 'call.notify') {
    // Create a listener to stream audio for transcription
    await dialstack.calls.createListener(
      event.call_id,
      {
        url: 'wss://your-server.example.com/audio',
        channel: 'both',
        metadata: { from: event.from_number, to: event.to_number },
      },
      { dialstackAccount: event.account_id }
    );
  }
});

// WebSocket server for receiving listener audio
const wss = new WebSocketServer({ noServer: true });

wss.on('connection', (ws) => {
  let listenerId;

  ws.on('message', (data) => {
    const message = JSON.parse(data);

    switch (message.event) {
      case 'begin':
        listenerId = message.listener_id;
        console.log(`Listening to call ${message.call_id} (${message.channel})`);
        break;

      case 'audio':
        // Send to your speech-to-text service
        transcribe(message.payload, message.channel);
        break;

      case 'end':
        console.log(`Listener ${listenerId} stopped: ${message.reason}`);
        break;
    }
  });
});

const server = app.listen(3000);
server.on('upgrade', (request, socket, head) => {
  if (request.url === '/audio') {
    wss.handleUpgrade(request, socket, head, (ws) => {
      wss.emit('connection', ws, request);
    });
  } else {
    socket.destroy();
  }
});

API Reference

Voice Apps — Create and manage voice apps
Update Call — Control active calls with actions
Listeners — Stream real-time audio from active calls
WebSocket API — Audio streaming protocol (voice apps and listeners)

Overview​

Installation​

Creating a Voice App​

Webhook Notifications​

Webhook Events​

Webhook Payload​

Verifying Signatures​

Webhook Response​

Voice App Dial Plan Nodes​

Voice App (Control)​

Voice App (Notify)​

Call Control​

Actions​

Attach Audio Stream​

Transfer​

Transfer to a SIP address (e.g. an AI voice agent)​

Combining Actions​

Replacing Actions​

WebSocket Audio Streaming​

Audio Format​

Messages from DialStack​

Messages to DialStack​

Ending the Session​

Using MediaStream (SDK)​

Complete Example: AI Voice Assistant​

Listeners​

Creating a Listener​

Channel Selection​

Stopping a Listener​

Listener WebSocket Protocol​

Complete Example: Real-Time Transcription​

API Reference​