BYO VoiceAI
DialStack is agent-agnostic. Bring your own AI stack and plug it into the call. Two integration patterns cover every BYO scenario.
Two patterns
| Pattern | What your AI does | Primitive |
|---|---|---|
| BYO Receptionist | Answers the call, talks with the caller, decides what happens next | Voice App (Control) mode + attach action |
| BYO Observer | Listens to the call for transcription / coaching / analytics, doesn't interrupt it | Voice App (Notify) mode + Listeners API |
Full Voice App semantics: Voice Apps.
Pattern 1: BYO Receptionist (bidirectional)
The AI answers the call and talks with the caller. You run the prompt, the tools, and the transcription; DialStack handles telephony.
Flow
Create the Voice App
curl -X POST https://api.dialstack.ai/v1/voice-apps \
-H 'Authorization: Bearer sk_live_YOUR_KEY' \
-H 'DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41' \
-H 'Content-Type: application/json' \
-d '{
"name": "BYO Receptionist",
"url": "https://your-server.example.com/voice/webhook"
}'
Handle the webhook and attach audio
// POST /voice/webhook
app.post('/voice/webhook', express.raw({ type: 'application/json' }), async (req, res) => {
verifySignature(req); // see voice-apps.md
const event = JSON.parse(req.body);
if (event.event === 'call.received') {
// Open the WebSocket for this call
await ds.calls.update(
event.call.id,
{
actions: [{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' }],
},
{ dialstackAccount: event.account_id }
);
}
res.status(200).end();
});
The audio WebSocket
Your service accepts a WebSocket at the URL you passed to attach. DialStack streams caller audio as μ-law 8 kHz frames (~20 ms each, base64-encoded JSON messages). You send agent audio back in the same format on the same socket. Full protocol — framing, control messages, error handling: WebSocket API.
Worked example: ElevenLabs Conversational AI
A bridge server sits between the two WebSockets and handles audio transcoding. Sketch:
import WebSocket from 'ws';
// DialStack opens this socket when your webhook responds with `attach`.
wss.on('connection', async (dsSocket) => {
// 1. Open the ElevenLabs Conversational AI WebSocket.
const elSocket = new WebSocket(
`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${process.env.ELEVENLABS_AGENT_ID}`,
{ headers: { 'xi-api-key': process.env.XI_API_KEY } }
);
// 2. Caller audio → ElevenLabs: μ-law 8 kHz → PCM 16 kHz.
dsSocket.on('message', (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.type !== 'audio') return;
const mulaw = Buffer.from(msg.payload, 'base64');
const pcm16 = upsample8kTo16k(mulawToPcm(mulaw));
elSocket.send(JSON.stringify({ user_audio_chunk: pcm16.toString('base64') }));
});
// 3. ElevenLabs audio → caller: PCM 16 kHz → μ-law 8 kHz.
elSocket.on('message', (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.type !== 'audio') return;
const pcm16 = Buffer.from(msg.audio_event.audio_base_64, 'base64');
const mulaw = pcmToMulaw(downsample16kTo8k(pcm16));
dsSocket.send(JSON.stringify({ type: 'audio', payload: mulaw.toString('base64') }));
});
dsSocket.on('close', () => elSocket.close());
elSocket.on('close', () => dsSocket.close());
});
Transcoding helpers (mulawToPcm, pcmToMulaw, upsample8kTo16k, downsample16kTo8k) are short — under 50 lines of DSP. Any Node audio library (e.g., alawmulaw, node-libsamplerate) will do.
ElevenLabs tool-use webhooks (for database lookups, appointment booking, etc.) are plain HTTPS endpoints configured on the agent — unrelated to the DialStack WebSocket. Configure them once when you create the agent via ElevenLabs' REST API.
The same shape works for OpenAI Realtime, Vapi, Retell, or a self-hosted model — swap the outbound WebSocket and the frame format.
Routing
Drop a Voice App (Control) node into any dial plan:
{
id: 'ai_reception',
type: 'internal_dial',
config: { target_id: 'va_01h2xcejqtf2nbrexx3vqjhp49' }, // your voice app ID
}
Pattern 2: BYO Observer (one-way)
The BYO Observer pattern depends on Voice App (Notify) mode and the Listeners API, both currently undergoing implementation. Pattern 1 (BYO Receptionist) is unaffected.
The AI listens to calls for real-time transcription, coaching, or analytics. It never talks to the caller and doesn't affect call routing.
Flow
Create the listener
if (event.event === 'call.notify') {
await ds.calls.createListener(
event.call.id,
{ url: 'wss://your-server.example.com/voice/listen' },
{ dialstackAccount: event.account_id }
);
}
Both parties on the call are unaware that a listener is attached. Listeners stop automatically when the call ends.
Routing
Use a Voice App (Notify) node alongside the real routing:
{
id: 'transcribe',
type: 'internal_dial',
config: { target_id: 'va_notify_id', mode: 'notify' },
}
Choosing between them
- Your AI should talk → Pattern 1 (Control mode +
attach). - Your AI should listen → Pattern 2 (Notify mode + Listeners API).
- Both → a dial plan can route through a Notify node first (transcription) and then to a Control node (AI agent). Or the Control-mode AI can itself open a sidecar listener.
Compared to DialStack-managed AI Agents
DialStack ships a first-party AI receptionist via AI Scheduling / the AI Agents API. Reach for BYO when:
- You already run an AI stack your team knows.
- You need a specific model, voice, or data-control guarantee.
- You want full control over the prompt, tools, and transcripts.
See also
- Voice Apps — full Voice App documentation including actions (
attach,transfer, combined/sequenced). - Dial Plans — how Voice App nodes slot into call routing.
- WebSocket API — media-streaming protocol for both
attachand Listeners. - AI Scheduling — the native alternative, when you don't need to bring your own.