BYO VoiceAI

DialStack is agent-agnostic. Bring your own AI stack and plug it into the call. Two integration patterns cover every BYO scenario.

Two patterns

Pattern	What your AI does	Primitive
BYO Receptionist	Answers the call, talks with the caller, decides what happens next	Voice App (Control) mode + `attach` action
BYO Observer	Listens to the call for transcription / coaching / analytics, doesn't interrupt it	Voice App (Notify) mode + Listeners API

Full Voice App semantics: Voice Apps.

Pattern 1: BYO Receptionist (bidirectional)

The AI answers the call and talks with the caller. You run the prompt, the tools, and the transcription; DialStack handles telephony.

Flow

Create the Voice App

curl -X POST https://api.dialstack.ai/v1/voice-apps \
  -H 'Authorization: Bearer sk_live_YOUR_KEY' \
  -H 'DialStack-Account: acct_01h2xcejqtf2nbrexx3vqjhp41' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "BYO Receptionist",
    "url": "https://your-server.example.com/voice/webhook"
  }'

Handle the webhook and attach audio

// POST /voice/webhook
app.post('/voice/webhook', express.raw({ type: 'application/json' }), async (req, res) => {
  verifySignature(req); // see voice-apps.md
  const event = JSON.parse(req.body);

  if (event.event === 'call.received') {
    // Open the WebSocket for this call
    await ds.calls.update(
      event.call.id,
      {
        actions: [{ type: 'attach', url: 'wss://your-server.example.com/voice/stream' }],
      },
      { dialstackAccount: event.account_id }
    );
  }

  res.status(200).end();
});

The audio WebSocket

Your service accepts a WebSocket at the URL you passed to attach. DialStack streams caller audio as μ-law 8 kHz frames (~20 ms each, base64-encoded JSON messages). You send agent audio back in the same format on the same socket. Full protocol — framing, control messages, error handling: WebSocket API.

Worked example: ElevenLabs Conversational AI

A bridge server sits between the two WebSockets and handles audio transcoding. Sketch:

import WebSocket from 'ws';

// DialStack opens this socket when your webhook responds with `attach`.
wss.on('connection', async (dsSocket) => {
  // 1. Open the ElevenLabs Conversational AI WebSocket.
  const elSocket = new WebSocket(
    `wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${process.env.ELEVENLABS_AGENT_ID}`,
    { headers: { 'xi-api-key': process.env.XI_API_KEY } }
  );

  // 2. Caller audio → ElevenLabs: μ-law 8 kHz → PCM 16 kHz.
  dsSocket.on('message', (raw) => {
    const msg = JSON.parse(raw.toString());
    if (msg.type !== 'audio') return;
    const mulaw = Buffer.from(msg.payload, 'base64');
    const pcm16 = upsample8kTo16k(mulawToPcm(mulaw));
    elSocket.send(JSON.stringify({ user_audio_chunk: pcm16.toString('base64') }));
  });

  // 3. ElevenLabs audio → caller: PCM 16 kHz → μ-law 8 kHz.
  elSocket.on('message', (raw) => {
    const msg = JSON.parse(raw.toString());
    if (msg.type !== 'audio') return;
    const pcm16 = Buffer.from(msg.audio_event.audio_base_64, 'base64');
    const mulaw = pcmToMulaw(downsample16kTo8k(pcm16));
    dsSocket.send(JSON.stringify({ type: 'audio', payload: mulaw.toString('base64') }));
  });

  dsSocket.on('close', () => elSocket.close());
  elSocket.on('close', () => dsSocket.close());
});

Transcoding helpers (mulawToPcm, pcmToMulaw, upsample8kTo16k, downsample16kTo8k) are short — under 50 lines of DSP. Any Node audio library (e.g., alawmulaw, node-libsamplerate) will do.

ElevenLabs tool-use webhooks (for database lookups, appointment booking, etc.) are plain HTTPS endpoints configured on the agent — unrelated to the DialStack WebSocket. Configure them once when you create the agent via ElevenLabs' REST API.

The same shape works for OpenAI Realtime, Vapi, Retell, or a self-hosted model — swap the outbound WebSocket and the frame format.

Routing

Drop a Voice App (Control) node into any dial plan:

{
  id: 'ai_reception',
  type: 'internal_dial',
  config: { target_id: 'va_01h2xcejqtf2nbrexx3vqjhp49' }, // your voice app ID
}

Pattern 2: BYO Observer (one-way)

Coming soon

The BYO Observer pattern depends on Voice App (Notify) mode and the Listeners API, both currently undergoing implementation. Pattern 1 (BYO Receptionist) is unaffected.

The AI listens to calls for real-time transcription, coaching, or analytics. It never talks to the caller and doesn't affect call routing.

Flow

Create the listener

if (event.event === 'call.notify') {
  await ds.calls.createListener(
    event.call.id,
    { url: 'wss://your-server.example.com/voice/listen' },
    { dialstackAccount: event.account_id }
  );
}

Both parties on the call are unaware that a listener is attached. Listeners stop automatically when the call ends.

Routing

Use a Voice App (Notify) node alongside the real routing:

{
  id: 'transcribe',
  type: 'internal_dial',
  config: { target_id: 'va_notify_id', mode: 'notify' },
}

Choosing between them

Your AI should talk → Pattern 1 (Control mode + attach).
Your AI should listen → Pattern 2 (Notify mode + Listeners API).
Both → a dial plan can route through a Notify node first (transcription) and then to a Control node (AI agent). Or the Control-mode AI can itself open a sidecar listener.

Compared to DialStack-managed AI Agents

DialStack ships a first-party AI receptionist via AI Scheduling / the AI Agents API. Reach for BYO when:

You already run an AI stack your team knows.
You need a specific model, voice, or data-control guarantee.
You want full control over the prompt, tools, and transcripts.

Two patterns​

Pattern 1: BYO Receptionist (bidirectional)​

Flow​

Create the Voice App​

Handle the webhook and attach audio​

The audio WebSocket​

Worked example: ElevenLabs Conversational AI​

Routing​

Pattern 2: BYO Observer (one-way)​

Flow​

Create the listener​

Routing​

Choosing between them​

Compared to DialStack-managed AI Agents​

See also​

Two patterns

Pattern 1: BYO Receptionist (bidirectional)

Flow

Create the Voice App

Handle the webhook and attach audio

The audio WebSocket

Worked example: ElevenLabs Conversational AI

Routing

Pattern 2: BYO Observer (one-way)

Flow

Create the listener

Routing

Choosing between them

Compared to DialStack-managed AI Agents

See also