Skip to main content

BYO VoiceAI

DialStack is agent-agnostic. Bring your own AI stack and plug it into the call. Two integration patterns cover every BYO scenario.

Two patterns

PatternWhat your AI doesPrimitive
BYO ReceptionistAnswers the call, talks with the caller, decides what happens nextVoice App (Control) mode + attach action
BYO ObserverListens to the call for transcription / coaching / analytics, doesn't interrupt itVoice App (Notify) mode + Listeners API

Full Voice App semantics: Voice Apps.

Pattern 1: BYO Receptionist (bidirectional)

The AI answers the call and talks with the caller. You run the prompt, the tools, and the transcription; DialStack handles telephony.

Flow

Create the Voice App

curl -X POST https://api.dialstack.ai/v1/voice-apps \
-H 'Authorization: Bearer sk_live_YOUR_KEY' \
-H 'DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41' \
-H 'Content-Type: application/json' \
-d '{
"name": "BYO Receptionist",
"url": "https://your-server.example.com/voice/webhook"
}'

Handle the webhook and attach audio

// POST /voice/webhook
app.post('/voice/webhook', express.raw({ type: 'application/json' }), async (req, res) => {
verifySignature(req); // see voice-apps.md
const event = JSON.parse(req.body);

if (event.event === 'call.received') {
// Open the WebSocket for this call
await ds.calls.update(
event.call.id,
{
actions: [{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' }],
},
{ dialstackAccount: event.account_id }
);
}

res.status(200).end();
});

The audio WebSocket

Your service accepts a WebSocket at the URL you passed to attach. DialStack streams caller audio as μ-law 8 kHz frames (~20 ms each, base64-encoded JSON messages). You send agent audio back in the same format on the same socket. Full protocol — framing, control messages, error handling: WebSocket API.

Worked example: ElevenLabs Conversational AI

A bridge server sits between the two WebSockets and handles audio transcoding. Sketch:

import WebSocket from 'ws';

// DialStack opens this socket when your webhook responds with `attach`.
wss.on('connection', async (dsSocket) => {
// 1. Open the ElevenLabs Conversational AI WebSocket.
const elSocket = new WebSocket(
`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${process.env.ELEVENLABS_AGENT_ID}`,
{ headers: { 'xi-api-key': process.env.XI_API_KEY } }
);

// 2. Caller audio → ElevenLabs: μ-law 8 kHz → PCM 16 kHz.
dsSocket.on('message', (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.type !== 'audio') return;
const mulaw = Buffer.from(msg.payload, 'base64');
const pcm16 = upsample8kTo16k(mulawToPcm(mulaw));
elSocket.send(JSON.stringify({ user_audio_chunk: pcm16.toString('base64') }));
});

// 3. ElevenLabs audio → caller: PCM 16 kHz → μ-law 8 kHz.
elSocket.on('message', (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.type !== 'audio') return;
const pcm16 = Buffer.from(msg.audio_event.audio_base_64, 'base64');
const mulaw = pcmToMulaw(downsample16kTo8k(pcm16));
dsSocket.send(JSON.stringify({ type: 'audio', payload: mulaw.toString('base64') }));
});

dsSocket.on('close', () => elSocket.close());
elSocket.on('close', () => dsSocket.close());
});

Transcoding helpers (mulawToPcm, pcmToMulaw, upsample8kTo16k, downsample16kTo8k) are short — under 50 lines of DSP. Any Node audio library (e.g., alawmulaw, node-libsamplerate) will do.

ElevenLabs tool-use webhooks (for database lookups, appointment booking, etc.) are plain HTTPS endpoints configured on the agent — unrelated to the DialStack WebSocket. Configure them once when you create the agent via ElevenLabs' REST API.

The same shape works for OpenAI Realtime, Vapi, Retell, or a self-hosted model — swap the outbound WebSocket and the frame format.

Routing

Drop a Voice App (Control) node into any dial plan:

{
id: 'ai_reception',
type: 'internal_dial',
config: { target_id: 'va_01h2xcejqtf2nbrexx3vqjhp49' }, // your voice app ID
}

Pattern 2: BYO Observer (one-way)

Coming soon

The BYO Observer pattern depends on Voice App (Notify) mode and the Listeners API, both currently undergoing implementation. Pattern 1 (BYO Receptionist) is unaffected.

The AI listens to calls for real-time transcription, coaching, or analytics. It never talks to the caller and doesn't affect call routing.

Flow

Create the listener

if (event.event === 'call.notify') {
await ds.calls.createListener(
event.call.id,
{ url: 'wss://your-server.example.com/voice/listen' },
{ dialstackAccount: event.account_id }
);
}

Both parties on the call are unaware that a listener is attached. Listeners stop automatically when the call ends.

Routing

Use a Voice App (Notify) node alongside the real routing:

{
id: 'transcribe',
type: 'internal_dial',
config: { target_id: 'va_notify_id', mode: 'notify' },
}

Choosing between them

  • Your AI should talk → Pattern 1 (Control mode + attach).
  • Your AI should listen → Pattern 2 (Notify mode + Listeners API).
  • Both → a dial plan can route through a Notify node first (transcription) and then to a Control node (AI agent). Or the Control-mode AI can itself open a sidecar listener.

Compared to DialStack-managed AI Agents

DialStack ships a first-party AI receptionist via AI Scheduling / the AI Agents API. Reach for BYO when:

  • You already run an AI stack your team knows.
  • You need a specific model, voice, or data-control guarantee.
  • You want full control over the prompt, tools, and transcripts.

See also

  • Voice Apps — full Voice App documentation including actions (attach, transfer, combined/sequenced).
  • Dial Plans — how Voice App nodes slot into call routing.
  • WebSocket API — media-streaming protocol for both attach and Listeners.
  • AI Scheduling — the native alternative, when you don't need to bring your own.