Agent-readable wiki
Patter SDK Developer Reference
Patter is an open-source Python and TypeScript SDK that connects AI agents to real phone calls via Twilio or Telnyx, handling telephony signaling, STT, TTS, and real-time audio streaming through a unified 4-line API. Both language SDKs expose an identical surface — the same carriers, engines, providers, and hooks — so cross-runtime teams can ship voice agents without rewriting business logic.
Pages
- Technical OrientationWhat Patter is, its end-to-end call data flow, the Patter / agent / serve three-object model, SDK entry points in both Python and TypeScript, and how the rest of this reference is organized.
- Carrier & Telephony LayerHow Twilio and Telnyx carrier adapters are structured, the WebSocket and webhook call-control handshakes, inbound vs. outbound call flows, DTMF, AMD, call transfer, voicemail drop, and recording parity between carriers.
- Voice Engines — Realtime, ConvAI & Pipeline ModeThe three voice architectures: speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI) vs. the STT→LLM→TTS pipeline, how engine adapters implement the shared engine interface, barge-in and VAD semantics, and latency trade-offs for each mode.
- STT & TTS Provider CatalogAll supported speech-to-text (Deepgram, Whisper, OpenAI Transcribe, AssemblyAI, Cartesia, Soniox, Speechmatics) and text-to-speech (ElevenLabs, Cartesia, OpenAI, LMNT, Rime, Inworld) adapters — their configuration, streaming contracts, known limitations, and how to swap providers in pipeline mode.
- LLM Providers, Tool Calling & GuardrailsThe LLM backend adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), the tool decorator and Tool dataclass for in-process and webhook-dispatch tools, output guardrail rules, dynamic variable substitution, and the PatterTool integration for LangChain / OpenAI Assistants orchestrators.
- Dashboard, Observability, Tunneling & DeploymentThe built-in real-time monitoring dashboard, vendor-neutral OpenTelemetry tracing, call metrics and cost tracking, Cloudflare quick-tunnel vs. static webhook URL trade-offs, test mode (no phone required), Docker Compose setup, and the agent skills bundle for AI coding assistants — the complete operational surface for running Patter in development and production.
Complete Markdown
# Patter SDK Developer Reference
> Patter is an open-source Python and TypeScript SDK that connects AI agents to real phone calls via Twilio or Telnyx, handling telephony signaling, STT, TTS, and real-time audio streaming through a unified 4-line API. Both language SDKs expose an identical surface — the same carriers, engines, providers, and hooks — so cross-runtime teams can ship voice agents without rewriting business logic.
## Context Links
- [Agent index](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/llms.txt)
- [Human interactive wiki](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc)
- [GitHub repository](https://github.com/PatterAI/Patter)
## Repository Metadata
- Repository: PatterAI/Patter
- Generated: 2026-05-27T19:17:53.037Z
- Updated: 2026-05-27T22:41:12.507Z
- Runtime: Pi · Claude Code · claude-sonnet-4-6:high
- Format: Technical
- Pages: 6
## Page Index
- 01. [Technical Orientation](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/01-technical-orientation.md) - What Patter is, its end-to-end call data flow, the Patter / agent / serve three-object model, SDK entry points in both Python and TypeScript, and how the rest of this reference is organized.
- 02. [Carrier & Telephony Layer](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/02-carrier-telephony-layer.md) - How Twilio and Telnyx carrier adapters are structured, the WebSocket and webhook call-control handshakes, inbound vs. outbound call flows, DTMF, AMD, call transfer, voicemail drop, and recording parity between carriers.
- 03. [Voice Engines — Realtime, ConvAI & Pipeline Mode](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/03-voice-engines-realtime-convai-pipeline-mode.md) - The three voice architectures: speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI) vs. the STT→LLM→TTS pipeline, how engine adapters implement the shared engine interface, barge-in and VAD semantics, and latency trade-offs for each mode.
- 04. [STT & TTS Provider Catalog](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/04-stt-tts-provider-catalog.md) - All supported speech-to-text (Deepgram, Whisper, OpenAI Transcribe, AssemblyAI, Cartesia, Soniox, Speechmatics) and text-to-speech (ElevenLabs, Cartesia, OpenAI, LMNT, Rime, Inworld) adapters — their configuration, streaming contracts, known limitations, and how to swap providers in pipeline mode.
- 05. [LLM Providers, Tool Calling & Guardrails](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/05-llm-providers-tool-calling-guardrails.md) - The LLM backend adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), the tool decorator and Tool dataclass for in-process and webhook-dispatch tools, output guardrail rules, dynamic variable substitution, and the PatterTool integration for LangChain / OpenAI Assistants orchestrators.
- 06. [Dashboard, Observability, Tunneling & Deployment](https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/06-dashboard-observability-tunneling-deployment.md) - The built-in real-time monitoring dashboard, vendor-neutral OpenTelemetry tracing, call metrics and cost tracking, Cloudflare quick-tunnel vs. static webhook URL trade-offs, test mode (no phone required), Docker Compose setup, and the agent skills bundle for AI coding assistants — the complete operational surface for running Patter in development and production.
## Source File Index
- `dashboard-app/src/App.tsx`
- `docker-compose.yml`
- `Dockerfile`
- `libraries/python/getpatter/_public_api.py`
- `libraries/python/getpatter/carriers/telnyx.py`
- `libraries/python/getpatter/carriers/twilio.py`
- `libraries/python/getpatter/client.py`
- `libraries/python/getpatter/engines/elevenlabs.py`
- `libraries/python/getpatter/engines/openai_realtime_2.py`
- `libraries/python/getpatter/engines/openai.py`
- `libraries/python/getpatter/integrations/patter_tool.py`
- `libraries/python/getpatter/llm/anthropic.py`
- `libraries/python/getpatter/llm/groq.py`
- `libraries/python/getpatter/llm/openai.py`
- `libraries/python/getpatter/models.py`
- `libraries/python/getpatter/observability/`
- `libraries/python/getpatter/providers/base.py`
- `libraries/python/getpatter/server.py`
- `libraries/python/getpatter/stream_handler.py`
- `libraries/python/getpatter/stt/deepgram.py`
- `libraries/python/getpatter/stt/openai_transcribe.py`
- `libraries/python/getpatter/stt/whisper.py`
- `libraries/python/getpatter/telephony/common.py`
- `libraries/python/getpatter/telephony/telnyx.py`
- `libraries/python/getpatter/telephony/twilio.py`
- `libraries/python/getpatter/test_mode.py`
- `libraries/python/getpatter/tools/`
- `libraries/python/getpatter/tts/cartesia.py`
- `libraries/python/getpatter/tts/elevenlabs.py`
- `libraries/python/getpatter/tts/openai.py`
- `libraries/python/getpatter/tunnel.py`
- `libraries/python/getpatter/tunnels/`
- `libraries/typescript/src/client.ts`
- `libraries/typescript/src/engines/`
- `libraries/typescript/src/llm-loop.ts`
- `libraries/typescript/src/llm/`
- `libraries/typescript/src/pipeline-hooks.ts`
- `libraries/typescript/src/public-api.ts`
- `libraries/typescript/src/server.ts`
- `libraries/typescript/src/stream-handler.ts`
- `libraries/typescript/src/types.ts`
- `README.md`
---
## 01. Technical Orientation
> What Patter is, its end-to-end call data flow, the Patter / agent / serve three-object model, SDK entry points in both Python and TypeScript, and how the rest of this reference is organized.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/01-technical-orientation.md
- Generated: 2026-05-27T19:13:00.949Z
### Source Files
- `README.md`
- `libraries/python/getpatter/client.py`
- `libraries/python/getpatter/_public_api.py`
- `libraries/python/getpatter/models.py`
- `libraries/typescript/src/client.ts`
- `libraries/typescript/src/public-api.ts`
- `libraries/typescript/src/types.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [README.md](README.md)
- [libraries/python/getpatter/client.py](libraries/python/getpatter/client.py)
- [libraries/python/getpatter/_public_api.py](libraries/python/getpatter/_public_api.py)
- [libraries/python/getpatter/models.py](libraries/python/getpatter/models.py)
- [libraries/typescript/src/client.ts](libraries/typescript/src/client.ts)
- [libraries/typescript/src/public-api.ts](libraries/typescript/src/public-api.ts)
- [libraries/typescript/src/types.ts](libraries/typescript/src/types.ts)
</details>
# Technical Orientation
This page explains what Patter is, how audio flows through the system from a telephone call to an AI response and back, how the three top-level SDK objects relate to each other, and where to find each public entry point in both the Python and TypeScript SDKs. It is the starting point for the rest of this reference.
Patter is an open-source SDK (packages `getpatter` on both PyPI and npm) that connects AI agents to real phone numbers. You supply a telephony carrier (Twilio or Telnyx), any AI engine or pipeline, and Patter handles WebSocket media streaming, speech-to-text, text-to-speech, barge-in detection, tool dispatch, call recording, AMD (answering-machine detection), and an observability dashboard. The advertised 4-line quickstart is not marketing shorthand — the three-object constructor → agent → serve pattern is literally the entire surface. Both the Python and TypeScript SDKs expose identical semantics, field names, and lifecycle hooks so a cross-runtime team can run the same agent in both languages without rewriting business logic.
---
## The Patter / Agent / Serve Three-Object Model
Every Patter program builds exactly three objects, in order.
```text
┌─────────────────────────────────────────────┐
│ Patter(carrier=..., phone_number=...) │ ← 1. client: carrier + tunnel
│ .agent(engine=..., system_prompt=...) │ ← 2. agent: AI config (frozen)
│ .serve(agent, tunnel=True, ...) │ ← 3. serve: embedded HTTP/WS server
└─────────────────────────────────────────────┘
```
### 1. `Patter` — the client
`Patter` is the SDK's root object. It owns the carrier credentials, the phone number, tunnel wiring, prewarm caches, and the `asyncio.Future` pair (`tunnel_ready`, `ready`) that signal when the embedded server is safe for outbound calls.
```python
# libraries/python/getpatter/client.py
phone = Patter(
carrier=Twilio(account_sid="AC...", auth_token="..."),
phone_number="+15550001234",
tunnel=True, # CloudflareTunnel(), Static(hostname=...), or bool
)
```
`Patter.__init__` normalises the carrier via `_unpack_carrier` (resolves `Twilio` or `Telnyx` instances into a `LocalConfig`), then resolves the tunnel directive into a webhook hostname or defers it to `serve()`. Cloud/API-key mode is explicitly rejected with `NotImplementedError` — this release is local mode only.
Sources: [libraries/python/getpatter/client.py:152-264]()
### 2. `Agent` — the frozen AI configuration
`agent()` is a factory method on `Patter` that returns a frozen `Agent` dataclass. `Agent` carries every parameter the server dispatches at call time: `system_prompt`, `voice`, `model`, `language`, `tools`, `guardrails`, `hooks`, `vad`, `stt`, `tts`, `llm`, `variables`, and latency-tuning knobs like `barge_in_threshold_ms` and `aggressive_first_flush`.
```python
# Realtime mode (speech-to-speech via OpenAI)
agent = phone.agent(
engine=OpenAIRealtime(),
system_prompt="You are a friendly receptionist.",
first_message="Hello! How can I help?",
tools=[transfer_tool],
guardrails=[profanity_rail],
)
```
The `provider` field is a closed literal union: `"openai_realtime" | "elevenlabs_convai" | "pipeline"`. Passing an `engine=` object sets `provider` implicitly. Passing `stt=` + `tts=` implies `pipeline` mode. Passing both raises a conflict.
Sources: [libraries/python/getpatter/models.py:108-210]()
### 3. `serve()` / `call()` — the embedded server
`serve()` starts a FastAPI/uvicorn server that registers WebSocket and webhook routes for inbound calls, applies tunnel auto-configuration, and blocks until the process exits. `call()` places an outbound call via the carrier REST API and coordinates prewarm tasks in parallel with the ring window.
```typescript
// TypeScript — identical surface
const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });
const agent = phone.agent({ engine: new OpenAIRealtime(), systemPrompt: "..." });
await phone.serve({ agent, tunnel: true, dashboard: true });
```
`serve()` accepts `onCallStart`, `onCallEnd`, `onTranscript`, `onMetrics`, `recording`, `voicemailMessage`, `dashboard`, and `manageWebhook` options. `call()` accepts `to`, `agent`, `machineDetection`, `onMachineDetection`, `voicemailMessage`, and `ringTimeout`.
Sources: [libraries/typescript/src/types.ts:283-374]()
---
## End-to-End Call Data Flow
The sequence below applies to both inbound (carrier calls the webhook) and outbound (SDK calls the carrier REST API first):
```mermaid
sequenceDiagram
participant Carrier as Carrier (Twilio / Telnyx)
participant Server as EmbeddedServer / StreamHandler
participant Provider as AI Provider
participant CB as User Callbacks
Carrier->>Server: HTTP POST /webhooks/twilio/answer (TwiML / Call Control)
Server->>Carrier: WebSocket upgrade response (stream URL)
Carrier-->>Server: WS frames: {event: "start", callSid, ...}
Server->>Provider: Adopt parked WS or open new connection (STT/TTS/Realtime)
Server->>CB: onCallStart(data)
loop Each turn
Carrier-->>Server: WS binary frame (mulaw 8kHz audio)
Server->>Provider: PCM audio → STT transcript (pipeline) OR raw audio (Realtime)
Provider-->>Server: LLM response text / speech tokens
Server->>Provider: TTS synthesis → PCM → mulaw
Server-->>Carrier: WS binary frame (mulaw 8kHz audio back)
Server->>CB: onTranscript(data), onMetrics(data)
end
Carrier-->>Server: WS {event: "stop"}
Server->>CB: onCallEnd({transcript, metrics})
```
Key implementation details:
- **Mulaw ↔ PCM transcoding** happens inside `StreamHandler`. Twilio delivers G.711 µ-law 8 kHz; Telnyx delivers PCM 16 kHz. The handler resamples before passing to STT adapters.
- **Prewarm** runs during the ringing window: `_park_provider_connections` opens STT/TTS/Realtime WebSockets before the callee answers, and `_spawn_prewarm_first_message` pre-renders the greeting to TTS bytes. Both are capped (`_PREWARM_CACHE_MAX = 200`, TTL 30 s) to bound memory and TTS cost on unanswered calls.
- **AMD (answering-machine detection)** is on by default. Twilio uses `MachineDetection=DetectMessageEnd` + `async_amd` (zero answer-latency for humans); Telnyx uses `answering_machine_detection=greeting_end`. The result fires `on_machine_detection(MachineDetectionResult)`.
- **Barge-in** is controlled by `barge_in_threshold_ms` (default 300 ms). Optional `barge_in_strategies` can defer cancellation until a per-strategy confirmation arrives within `barge_in_confirm_ms`.
Sources: [libraries/python/getpatter/client.py:500-680]()
---
## Provider Modes
Patter dispatches at call time to one of three provider modes, selected by the `provider` field on `Agent` / `AgentOptions`.
| Mode | `provider` value | Audio path | Typical latency |
|---|---|---|---|
| OpenAI Realtime | `"openai_realtime"` | Speech → OpenAI Realtime API (bidirectional WS, native audio) | Lowest |
| ElevenLabs ConvAI | `"elevenlabs_convai"` | Speech → ElevenLabs Conversational AI (managed) | Low |
| Pipeline | `"pipeline"` | STT → LLM → TTS (sequential, BYOC) | Configurable |
Pipeline mode is the only mode that uses `STTConfig`, `TTSConfig`, `PipelineHooks`, `vad`, `audio_filter`, `background_audio`, `llm`, `text_transforms`, and `prewarm_first_message`. Realtime and ConvAI modes route audio directly to the provider's own WebSocket; `PipelineHooks` and `prewarm_first_message` are silently ignored or warned for those modes.
```python
# Pipeline mode example
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS
phone = Patter(carrier=Twilio(), phone_number="+1...")
agent = phone.agent(
system_prompt="...",
stt=DeepgramSTT(api_key="..."),
tts=ElevenLabsTTS(api_key="...", voice="rachel"),
)
```
Sources: [libraries/python/getpatter/models.py:22-29](), [libraries/python/getpatter/models.py:108-133]()
---
## SDK Entry Points
Both SDKs are published as `getpatter` and expose an identical high-level surface.
### Python SDK
**Installation**: `pip install getpatter`
The top-level `__init__.py` re-exports from three source modules:
| Symbol | Source module | Purpose |
|---|---|---|
| `Patter` | `getpatter.client` | Root SDK client |
| `Twilio`, `Telnyx` | `getpatter.carriers.*` | Carrier credentials |
| `OpenAIRealtime`, `ElevenLabsConvAI` | `getpatter.engines.*` | Engine markers |
| `Agent`, `CallMetrics`, `PipelineHooks`, `STTConfig`, `TTSConfig`, `CallControl`, `MachineDetectionResult` | `getpatter.models` | Runtime types (all frozen dataclasses) |
| `Tool`, `Guardrail`, `tool()`, `guardrail()` | `getpatter._public_api` | Declarative tool + guardrail factories |
| `DeepgramSTT`, `ElevenLabsTTS`, `CartesiaTTS`, … | `getpatter.stt.*`, `getpatter.tts.*` | Pipeline-mode adapters |
**`Tool`** is a frozen dataclass requiring exactly one of `handler` (callable) or `webhook_url` (string). The `tool()` factory accepts decorator form (`@tool`) or keyword-constructor form:
```python
# libraries/python/getpatter/_public_api.py
@tool
async def lookup_account(phone: str) -> str:
"""Look up the account for a caller."""
return await crm.find(phone)
# or explicit form:
t = tool(name="transfer_call", description="Transfer the call.", handler=my_handler)
```
**`Guardrail`** checks LLM output before TTS. A match replaces the response with `replacement` (default: `"I'm sorry, I can't respond to that."`):
```python
from getpatter import guardrail
rail = guardrail("profanity", blocked_terms=["badword"], replacement="Let me rephrase that.")
```
Sources: [libraries/python/getpatter/_public_api.py:31-90]()
**Key `Agent` fields** (all optional except `system_prompt`):
| Field | Type | Default | Notes |
|---|---|---|---|
| `system_prompt` | `str` | required | Supports `{variable}` placeholders |
| `provider` | `ProviderMode` | `"openai_realtime"` | Set implicitly by `engine=` |
| `voice` | `str` | `"alloy"` | Provider-specific voice name/ID |
| `model` | `str` | `"gpt-4o-mini-realtime-preview"` | LLM / Realtime model ID |
| `tools` | `list[dict] \| None` | `None` | Accepts `Tool` instances |
| `guardrails` | `list[Guardrail] \| None` | `None` | Output filters |
| `hooks` | `PipelineHooks \| None` | `None` | Pipeline-mode stage hooks |
| `prewarm` | `bool` | `True` | Warm provider connections during ring |
| `prewarm_first_message` | `bool` | `False` (raw); `True` via factory in pipeline | Pre-render greeting TTS |
| `barge_in_threshold_ms` | `int` | `300` | ms of speech before interrupt |
| `disable_phone_preamble` | `bool` | `False` | Skip phone-friendly system prefix |
Sources: [libraries/python/getpatter/models.py:108-265]()
### TypeScript SDK
**Installation**: `npm install getpatter`
The TypeScript SDK mirrors the Python surface. Types live in `types.ts`; the runtime class and factory functions live in `client.ts` and `public-api.ts`.
| Symbol | Source file | Purpose |
|---|---|---|
| `Patter` (class) | `src/client.ts` | Root SDK client |
| `Twilio`, `Telnyx` | `src/telephony/twilio.ts`, `src/telephony/telnyx.ts` | Carrier credentials |
| `OpenAIRealtime`, `ElevenLabsConvAI` | `src/engines/openai.ts`, `src/engines/elevenlabs.ts` | Engine markers |
| `Tool` (class), `Guardrail` (class), `tool()`, `guardrail()` | `src/public-api.ts` | Tool + guardrail factories |
| `AgentOptions`, `ServeOptions`, `LocalCallOptions`, `PipelineHooks`, `MachineDetectionResult` | `src/types.ts` | TypeScript interfaces |
The TypeScript `Tool` class validates in its constructor that exactly one of `handler` or `webhookUrl` is provided, matching the Python `Tool.__post_init__` invariant. `Guardrail` exposes `blockedTerms`, `check`, and `replacement`. Both classes implement the internal `ToolDefinition` / `Guardrail` interface contracts so they drop in as plain objects anywhere the SDK accepts them.
```typescript
// libraries/typescript/src/public-api.ts
import { Tool, Guardrail, tool, guardrail } from "getpatter";
const t = new Tool({
name: "lookup_account",
description: "Look up a CRM account.",
handler: async (args) => JSON.stringify(await crm.find(args.phone)),
});
const rail = new Guardrail({ name: "profanity", blockedTerms: ["badword"] });
```
Sources: [libraries/typescript/src/public-api.ts:1-126](), [libraries/typescript/src/types.ts:175-280]()
**`ServeOptions` key fields** (TypeScript):
| Field | Type | Notes |
|---|---|---|
| `agent` | `AgentOptions` | Required |
| `port` | `number` | Default 8000 |
| `tunnel` | `boolean` | Auto-start Cloudflare tunnel |
| `dashboard` | `boolean` | Serve built-in UI at `/dashboard` |
| `recording` | `boolean` | Enable carrier-side recording |
| `onCallStart / onCallEnd / onTranscript / onMetrics` | `CallEventHandler` | Lifecycle callbacks |
| `onMessage` | `PipelineMessageHandler \| string` | Pipeline custom LLM or webhook URL |
| `manageWebhook` | `boolean` | Auto-configure carrier webhook on startup |
---
## Observability and Metrics
`CallMetrics` (Python dataclass / TS interface) is delivered to the `onCallEnd` callback and stored in the dashboard. It includes `LatencyBreakdown` per turn (with `stt_ms`, `llm_ms`, `tts_ms`, `agent_response_ms`, `endpoint_ms`, `bargein_ms`, `llm_ttft_ms`) and a `CostBreakdown` (STT, TTS, LLM, telephony in USD). Percentile summaries (`latency_p50`, `latency_p90`, `latency_p95`, `latency_p99`) are computed across all turns of the call.
`agent_response_ms` is the user-perceived latency metric: `endpoint_ms + llm_ttft_ms + tts_ms`. It excludes how long the caller spoke, isolating only system-controlled latency — the number to watch on p95 SLO dashboards.
Tracing uses vendor-neutral OpenTelemetry. No external collector is required for the built-in dashboard.
Sources: [libraries/python/getpatter/models.py:295-385]()
---
## Reference Organization
The rest of this wiki is organized by functional area:
| Area | What it covers |
|---|---|
| **Carriers** | Twilio vs. Telnyx configuration, AMD, DTMF, transfer, recording parity |
| **Provider Modes** | OpenAI Realtime, ElevenLabs ConvAI, Pipeline (STT + LLM + TTS selection) |
| **Tools & Guardrails** | `Tool` + `tool()`, `Guardrail` + `guardrail()`, webhook vs. handler dispatch, MCP servers |
| **Pipeline Hooks** | `PipelineHooks` stage contract (`before_send_to_stt` → `after_transcribe` → `before_llm` → `after_llm` → `before_synthesize` → `after_synthesize`) |
| **Tunneling** | Cloudflare Quick Tunnel, ngrok, `Static(hostname=...)`, production webhook patterns |
| **Latency & Prewarm** | `prewarm`, `prewarm_first_message`, parked provider WebSockets, `aggressive_first_flush` |
| **Observability** | `CallMetrics`, `LatencyBreakdown`, `CostBreakdown`, dashboard, OpenTelemetry |
| **Outbound Calls** | `call()`, AMD, voicemail drop, `ring_timeout`, `MachineDetectionResult` |
| **Speech Events** | `on_user_speech_started/ended`, `on_agent_speech_started/ended`, `on_llm_token`, `on_audio_out` |
| **Testing** | `phone.test(agent)` — local playback without a carrier |
---
Patter's design keeps every AI provider, telephony carrier, STT, TTS, and LLM component pluggable. No Patter-hosted backend is required in this release — all media and model calls flow directly from your infrastructure to the carriers and provider APIs you configure.
---
## 02. Carrier & Telephony Layer
> How Twilio and Telnyx carrier adapters are structured, the WebSocket and webhook call-control handshakes, inbound vs. outbound call flows, DTMF, AMD, call transfer, voicemail drop, and recording parity between carriers.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/02-carrier-telephony-layer.md
- Generated: 2026-05-27T19:11:37.362Z
### Source Files
- `libraries/python/getpatter/carriers/twilio.py`
- `libraries/python/getpatter/carriers/telnyx.py`
- `libraries/python/getpatter/telephony/twilio.py`
- `libraries/python/getpatter/telephony/telnyx.py`
- `libraries/python/getpatter/telephony/common.py`
- `libraries/python/getpatter/server.py`
- `libraries/typescript/src/server.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [libraries/python/getpatter/carriers/twilio.py](libraries/python/getpatter/carriers/twilio.py)
- [libraries/python/getpatter/carriers/telnyx.py](libraries/python/getpatter/carriers/telnyx.py)
- [libraries/python/getpatter/providers/twilio_adapter.py](libraries/python/getpatter/providers/twilio_adapter.py)
- [libraries/python/getpatter/providers/telnyx_adapter.py](libraries/python/getpatter/providers/telnyx_adapter.py)
- [libraries/python/getpatter/telephony/twilio.py](libraries/python/getpatter/telephony/twilio.py)
- [libraries/python/getpatter/telephony/telnyx.py](libraries/python/getpatter/telephony/telnyx.py)
- [libraries/python/getpatter/telephony/common.py](libraries/python/getpatter/telephony/common.py)
- [libraries/python/getpatter/server.py](libraries/python/getpatter/server.py)
- [libraries/typescript/src/server.ts](libraries/typescript/src/server.ts)
- [libraries/python/getpatter/client.py](libraries/python/getpatter/client.py)
</details>
# Carrier & Telephony Layer
The carrier and telephony layer is the interface between Patter's AI pipeline and the public switched telephone network. It provides a uniform abstraction over Twilio and Telnyx so the rest of the system — stream handlers, metrics, observability — can operate without knowing which carrier is active. The layer covers credential management, webhook signature verification, WebSocket media-stream bridging, audio codec transcoding, DTMF, answering-machine detection (AMD), call transfer, voicemail drop, and recording.
Both Python (`libraries/python/getpatter/`) and TypeScript (`libraries/typescript/src/`) runtimes implement the full Twilio and Telnyx surface with close behavioural parity; this page is grounded primarily in the Python implementation, with TypeScript differences noted where they exist.
---
## Carrier Credentials and the `kind` Discriminator
Each carrier is represented by a frozen, self-validating dataclass in `getpatter/carriers/`.
```python
# libraries/python/getpatter/carriers/twilio.py
@dataclass(frozen=True)
class Carrier:
account_sid: str = ""
auth_token: str = ""
@property
def kind(self) -> str:
return "twilio"
```
```python
# libraries/python/getpatter/carriers/telnyx.py
@dataclass(frozen=True)
class Carrier:
api_key: str = ""
connection_id: str = ""
public_key: str = "" # Ed25519 public key for webhook verification
@property
def kind(self) -> str:
return "telnyx"
```
`__post_init__` on both classes falls back to environment variables (`TWILIO_ACCOUNT_SID` / `TWILIO_AUTH_TOKEN`; `TELNYX_API_KEY` / `TELNYX_CONNECTION_ID`) and raises `ValueError` if required fields are missing. The `kind` property is the stable discriminator used downstream by Phase 2 dispatch to instantiate the correct `TwilioAdapter` or `TelnyxAdapter`.
Sources: [carriers/twilio.py:10-43](), [carriers/telnyx.py:10-50]()
---
## Provider Adapters
Both adapters implement the common `TelephonyProvider` interface (`providers/base.py`) with four operations: `provision_number`, `configure_number`, `initiate_call`, and `end_call`.
| Capability | `TwilioAdapter` | `TelnyxAdapter` |
|---|---|---|
| Client library | `twilio.rest.Client` (sync, run in executor) | `httpx.AsyncClient` (native async) |
| Provision | `available_phone_numbers.local.list` → `incoming_phone_numbers.create` | `GET /available_phone_numbers` → `POST /number_orders` |
| Configure | `incoming_phone_numbers[n].update(voice_url=…)` | `PATCH /phone_numbers/{id}/voice` with `connection_id` |
| Initiate call | `calls.create(twiml=<Connect><Stream>…)` | `POST /calls` (stream attached later, see below) |
| End call | `calls(sid).update(status="completed")` | `POST /calls/{id}/actions/hangup` |
| Observability | emits `patter.cost.telephony_minutes` on active span | same |
`TwilioAdapter.initiate_call` builds a TwiML `<Connect><Stream url="wss://…/outbound"/>` inline and passes it to `calls.create`, so the outbound call is already wired to the WebSocket URL at dial time.
`TelnyxAdapter.initiate_call` posts only to `POST /calls`. Telnyx does **not** accept stream parameters at dial time; media streaming is attached separately after the `call.answered` webhook fires (via `actions/answer` with inline stream params). The `stream_url` argument to `initiate_call` is intentionally unused and retained only for interface parity.
Sources: [providers/twilio_adapter.py:34-161](), [providers/telnyx_adapter.py:19-185]()
---
## Webhook and WebSocket Route Map
The `EmbeddedServer` (Python: `server.py`, TypeScript: `server.ts`) registers all carrier routes on a single FastAPI / Express application.
```text
HTTP Webhooks
POST /webhooks/twilio/voice ← inbound call arrival (returns TwiML)
POST /webhooks/twilio/status ← call lifecycle transitions
POST /webhooks/twilio/amd ← Async AMD result
POST /webhooks/twilio/recording ← recording completion
POST /webhooks/telnyx/voice ← all Telnyx Call Control events (single endpoint)
WebSocket Streams
WS /ws/stream/{call_id} ← Twilio Media Stream (inbound)
WS /ws/stream/outbound ← Twilio Media Stream (outbound)
WS /ws/telnyx/stream/{call_id} ← Telnyx bidirectional media (inbound)
WS /ws/telnyx/stream/outbound ← Telnyx bidirectional media (outbound)
```
Sources: [server.py:448-470](), [server.py:727-760](), [server.py:851-1001]()
---
## Webhook Signature Verification
### Twilio — HMAC-SHA1
The Twilio path uses the `twilio-python` `RequestValidator` to verify the `X-Twilio-Signature` header. The validator is instantiated from `twilio_token` and the reconstructed `https://` URL. If the `twilio` package is missing when a `twilio_token` is present, the server **rejects** the request with HTTP 503 rather than silently skipping validation.
### Telnyx — Ed25519
Telnyx signs every webhook with an Ed25519 private key. The server verifies the `telnyx-signature-ed25519` header using the base64-encoded DER public key stored in `LocalConfig.telnyx_public_key`. The signed payload is `timestamp + "|" + raw_body`. Timestamp staleness is checked with a 300-second tolerance (handling the Telnyx seconds-vs-milliseconds epoch ambiguity with a heuristic). Multiple comma-separated signatures in the header are each tried in order to support key rotation; the webhook is accepted if any one verifies.
If `require_signature=True` (the default) and the public key is absent, the webhook returns HTTP 503.
Sources: [server.py:151-207]() (Telnyx Ed25519), [server.py:454-510]() (Twilio HMAC-SHA1)
---
## Inbound Call Flow
### Twilio
```mermaid
sequenceDiagram
participant Carrier as Twilio
participant Server as EmbeddedServer
participant Bridge as twilio_stream_bridge
participant AI as StreamHandler
Carrier->>Server: POST /webhooks/twilio/voice (CallSid, From, To)
Server->>Carrier: 200 TwiML <Connect><Stream url="wss://…/ws/stream/{sid}"><Parameter caller/callee>
Carrier-->>Bridge: WebSocket connect /ws/stream/{call_sid}
Bridge->>Bridge: event="start" → read customParameters (caller, callee, callSid)
Bridge->>AI: handler.start()
loop media
Carrier-->>Bridge: event="media" (mulaw 8kHz, base64)
Bridge->>AI: handler.on_audio_received(mulaw)
end
Carrier-->>Bridge: event="stop"
Bridge->>Bridge: flush + cleanup + metrics finalization
```
**Critical detail on `<Parameter>` tags**: Twilio's Media Stream implementation strips query-string parameters from the `<Stream url=…>` before opening the WebSocket. Caller and callee must be forwarded as `<Parameter name="caller" value="…"/>` children of `<Stream>`; the bridge reads them from `start.customParameters` on the WS `start` frame.
Sources: [telephony/twilio.py:67-97](), [telephony/twilio.py:268-316]()
### Telnyx
Telnyx uses a REST command model. All call lifecycle events arrive at the single `POST /webhooks/telnyx/voice` endpoint as JSON with an `event_type` discriminator.
```mermaid
sequenceDiagram
participant Carrier as Telnyx
participant Server as EmbeddedServer
participant Bridge as telnyx_stream_bridge
participant AI as StreamHandler
Carrier->>Server: POST /webhooks/telnyx/voice {event_type: "call.initiated"}
Server->>Carrier: POST /calls/{id}/actions/answer (inline stream_url, PCMU 8kHz)
Carrier->>Server: POST /webhooks/telnyx/voice {event_type: "call.answered"} (no-op, stream already active)
Carrier-->>Bridge: WebSocket connect /ws/telnyx/stream/{call_control_id}?caller=…&callee=…
Bridge->>Bridge: event="start" → extract call_control_id, from, to
Bridge->>AI: handler.start()
loop media (inbound_track only)
Carrier-->>Bridge: event="media" {track: "inbound", payload: base64}
Bridge->>AI: handler.on_audio_received(mulaw)
end
Carrier-->>Bridge: event="stop"
Bridge->>Bridge: flush + cleanup + metrics finalization
```
**Inline stream optimisation**: Rather than answering the call and then POSTing `streaming_start` as two separate REST calls, the server folds both into a single `actions/answer` body. This removes one `call.answered` webhook round-trip and one HTTP POST, saving approximately 100–200 ms per inbound call.
The `inbound_track` stream filter is set on Telnyx to halve upstream WebSocket bandwidth; outbound echo frames (track=`outbound`) are discarded in the bridge even when `both_tracks` is negotiated.
Sources: [server.py:851-888](), [telephony/telnyx.py:134-163](), [telephony/telnyx.py:375-384]()
---
## Outbound Call Flow
Both carriers share the same `Patter.call()` entry point in `client.py`. The dispatch switches on `config.telephony_provider`.
**Twilio outbound**: `TwilioAdapter.initiate_call()` calls `twilio.calls.create()` with an inline TwiML body that points to `/ws/stream/outbound`. All extra parameters (AMD, ring timeout, status callback) are passed as snake_case kwargs; the twilio-python SDK translates them to PascalCase on the wire. Passing PascalCase directly would raise `TypeError`.
**Telnyx outbound**: `TelnyxAdapter.initiate_call()` posts to `POST /calls` with no stream URL (unsupported at dial time). The `call.answered` webhook triggers `actions/answer` with the stream parameters. Telnyx receives the WebSocket connection at `/ws/telnyx/stream/outbound`.
Sources: [client.py:628-757](), [providers/twilio_adapter.py:74-104](), [providers/telnyx_adapter.py:94-163]()
---
## Audio Codec and Transcoding
Both carriers negotiate PCMU (G.711 μ-law) 8 kHz on the RTP / WebSocket leg. Audio frames are base64-encoded in JSON.
### `TwilioAudioSender`
Twilio Media Streams deliver mulaw 8 kHz. Outbound direction depends on the provider mode:
- **`openai_realtime` / `openai_realtime_2`**: OpenAI Realtime is configured with `audio_format="g711_ulaw"`, so it emits 8 kHz mulaw directly. The sender sets `input_is_mulaw_8k=True` and forwards bytes as-is, avoiding a 24 kHz → 16 kHz → 8 kHz resample chain that would produce audibly slurred output.
- **`pipeline` / `elevenlabs_convai`**: TTS providers emit PCM16 at 16 kHz. The sender transcodes using a `StatefulResampler` (preserves IIR filter state across chunks to avoid aliasing artifacts from restarting on each frame) and a `PcmCarry` buffer that aligns odd-length chunks before resampling.
The sender also implements playback marks (`send_mark`, `on_mark_confirmed`) for tracking TTS playback completion, and a `flush()` method to drain the resampler tail at call end.
### `TelnyxAudioSender`
Structurally identical to the Twilio sender. Telnyx does not support playback marks, so `send_mark` is a documented no-op. `send_clear` emits `{"event": "clear"}`.
Sources: [telephony/twilio.py:107-222](), [telephony/telnyx.py:182-266]()
---
## DTMF
### Inbound DTMF
| Carrier | Delivery mechanism | Bridge handling |
|---|---|---|
| Twilio | In-band `event="dtmf"` on the media-stream WebSocket | `handler.on_dtmf(digit)` + optional `on_transcript` |
| Telnyx | In-band `event="dtmf"` on the media-stream WebSocket **and** out-of-band `call.dtmf.received` REST webhook | In-band: `handler.on_dtmf(digit)`; Webhook: acknowledged with HTTP 200 (no duplicate processing) |
Both paths also fire `on_transcript` with `{"role": "user", "text": "[DTMF: {digit}]"}` for observability.
### Outbound DTMF (Telnyx only)
The `_telnyx_send_dtmf` helper posts one `actions/send_dtmf` command per digit to the Telnyx Call Control REST API, with a configurable inter-digit delay (default 300 ms). Allowed characters are `0–9`, `*`, `#`, `A–D`, `a–d`, `w`, `W`; `w`/`W` are Telnyx pause characters (500 ms each). Duration is clamped to 100–500 ms per digit.
Sources: [telephony/twilio.py:392-409](), [telephony/telnyx.py:432-455](), [telephony/telnyx.py:293-344]()
---
## Answering Machine Detection (AMD)
AMD is **enabled by default** on outbound calls (`machine_detection=True`). It can be disabled per-call to avoid per-call AMD billing on known-human destinations.
| | Twilio | Telnyx |
|---|---|---|
| Activation | `machine_detection="DetectMessageEnd"` + `async_amd="true"` + `async_amd_status_callback` | `answering_machine_detection="greeting_end"` in `POST /calls` |
| Answer latency | Zero additional latency on human pickup (Async AMD) | Zero additional latency |
| Result delivery | `POST /webhooks/twilio/amd` (`AnsweredBy` field) | `POST /webhooks/telnyx/voice` with `event_type="call.machine.detection.ended"` (`result` field) |
| Machine values | `machine_end_beep`, `machine_end_silence` | `machine`, `machine_detected` |
Both paths normalise to a carrier-agnostic `MachineDetectionResult` with classification `"human" | "machine" | "fax" | "unknown"`:
```python
# server.py
def _classify_twilio_amd(answered_by: str) -> str:
if answered_by == "human": return "human"
if answered_by.startswith("machine_"): return "machine"
if answered_by == "fax": return "fax"
return "unknown"
```
Sources: [server.py:103-122](), [client.py:651-657](), [providers/telnyx_adapter.py:136-140]()
---
## Voicemail Drop
When AMD classifies the callee as a machine and a `voicemail_message` is configured, the server executes a carrier-specific drop.
**Twilio**: POSTs a TwiML update to `Calls/{sid}.json` with `<Response><Say>{message}</Say><Hangup/>`. Validates the CallSid format (34 chars, `CA` prefix) before interpolating it into the REST URL to prevent path traversal.
**Telnyx** (`handle_amd_result`): Posts to `calls/{id}/actions/speak` with the message text, then after a heuristic sleep (~150 ms per character, capped at 30 s), posts to `actions/hangup`. A `client_state` marker (`voicemail-drop`, base64-encoded) is included so a future `call.speak.ended` webhook can trigger the hangup with exact timing instead of the heuristic.
Sources: [server.py:674-703](), [telephony/telnyx.py:46-103]()
---
## Call Transfer
Both bridges expose a `transfer_fn` injected into the stream handler that fires when the agent invokes the `transfer_call` tool.
**Twilio** (`_twilio_transfer`): POSTs TwiML `<Response><Dial>{number}</Dial></Response>` to `Calls/{sid}.json`, after validating E.164 format and the CallSid 34-char format.
**Telnyx** (`_telnyx_transfer`): POSTs `{"to": number}` to `calls/{id}/actions/transfer`. Accepts either a validated E.164 number or a SIP URI (`sip:user@host` / `sips:user@host`). An optional `client_state` string is base64-encoded per Telnyx contract and echoed on subsequent webhooks.
Sources: [telephony/twilio.py:420-452](), [telephony/telnyx.py:461-495]()
---
## Recording
Both carriers support optional call recording (enabled with `recording=True` at `Patter` construction).
| Step | Twilio | Telnyx |
|---|---|---|
| Start | `POST /Accounts/{sid}/Calls/{call_sid}/Recordings.json` at stream start | `POST /calls/{id}/actions/record_start` (`format=mp3`, `channels=single`) at stream start |
| Stop | Automatic on call end (Twilio manages) | `POST /calls/{id}/actions/record_stop` in the `finally` cleanup block |
| Completion webhook | `POST /webhooks/twilio/recording` (`RecordingSid`, `RecordingUrl`) | `call.recording.saved` event at `/webhooks/telnyx/voice` (`recording_urls.mp3`, `public_recording_urls.mp3`) |
| Failure | Non-fatal warning logged | Non-fatal warning logged |
Sources: [telephony/twilio.py:305-319](), [telephony/telnyx.py:346-387](), [server.py:560-568](), [server.py:968-985]()
---
## Security Guardrails
Several defences are embedded throughout the layer:
- **Twilio SID validation** (`_validate_twilio_sid`): All REST calls that interpolate a `CallSid` into a URL validate it as exactly 34 characters with a two-letter prefix and 32 lowercase hex digits. This prevents path traversal / SSRF against the Twilio API.
- **SSRF protection** (`validate_webhook_url`): Blocks non-HTTP(S) schemes and all private, loopback, link-local, and cloud-metadata IP ranges / hostnames before any fetch. Mirrors between Python (`server.py:44-101`) and TypeScript (`server.ts:130-202`).
- **WebSocket message size cap**: Both Twilio and Telnyx bridges reject messages over 1 MB (Twilio audio frames are ~160 bytes; Telnyx 640 bytes) to prevent memory exhaustion from malformed or malicious stream peers.
- **Per-IP WebSocket connection cap**: `MAX_WS_PER_IP = 10` enforced in both Python and TypeScript; excess connections are closed with code 1008 before acceptance.
- **Fail-closed signature enforcement**: Missing `twilio_token` or `telnyx_public_key` with `require_signature=True` (the default) returns HTTP 503 rather than accepting unsigned webhooks.
Sources: [telephony/twilio.py:46-55](), [server.py:44-101](), [server.py:37-43]()
---
## Provider Mode Selection
All three provider modes work identically on both carriers. The mode is resolved from `agent.provider` at stream start:
| Mode | STT | LLM+TTS | `audio_format` to AI provider |
|---|---|---|---|
| `openai_realtime` (default) | Built into OpenAI Realtime | OpenAI Realtime | `"g711_ulaw"` (mulaw bypass) |
| `openai_realtime_2` | Built into OpenAI Realtime (GA API) | OpenAI Realtime | `"g711_ulaw"` |
| `elevenlabs_convai` | ElevenLabs | ElevenLabs | PCM16 |
| `pipeline` | Configurable (`deepgram`, `whisper`, `cartesia`, `soniox`, `speechmatics`, `assemblyai`) | Configurable LLM + TTS | PCM16 |
For OpenAI Realtime modes, `TwilioAudioSender` and `TelnyxAudioSender` are both created with `input_is_mulaw_8k=True`, forwarding the carrier's native mulaw bytes directly to OpenAI and bypassing the stateful resampler chain.
Sources: [telephony/twilio.py:334-345](), [telephony/telnyx.py:287-292](), [telephony/twilio.py:346-411]()
---
## Metrics and Observability
Both bridge cleanup paths (`finally`) follow the same sequence:
1. Flush the `AudioSender` resampler tail (prevents clipping the last audio frame).
2. Call `handler.cleanup()` on the stream handler.
3. Emit `patter.cost.telephony_minutes` on the active OTel span via the adapter's `record_call_end_cost`.
4. Query the actual telephony cost from the carrier REST API (`Calls/{sid}.json` for Twilio; `GET /calls/{id}` for Telnyx) and write it to the metrics accumulator.
5. Query Deepgram STT cost if applicable.
6. Finalize metrics with `metrics.end_call()`.
7. Fire `on_call_end` with the structured result (call_id, transcript, metrics).
8. Close the `patter_call_scope` OTel context — done last so all cleanup spans inherit `patter.call_id` and `patter.side`.
Sources: [telephony/twilio.py:495-570](), [telephony/telnyx.py:580-660]()
---
## Summary
The carrier and telephony layer provides a symmetric dual-adapter design: `twilio.Carrier` / `TwilioAdapter` and `telnyx.Carrier` / `TelnyxAdapter` implement the same `TelephonyProvider` interface and expose functionally identical bridges (`twilio_stream_bridge` / `telnyx_stream_bridge`) over different wire protocols. Twilio uses TwiML webhooks and synchronous-then-streamed Media Streams; Telnyx uses a REST command model with Call Control events and inline stream negotiation. Codec transcoding, AMD, voicemail drop, call transfer, DTMF, recording, security guardrails, and metrics finalization are all implemented on both carriers with documented behavioural parity. Provider mode selection (OpenAI Realtime, ElevenLabs ConvAI, or pipeline STT+LLM+TTS) is orthogonal to carrier selection and applies uniformly to both bridges.
---
## 03. Voice Engines — Realtime, ConvAI & Pipeline Mode
> The three voice architectures: speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI) vs. the STT→LLM→TTS pipeline, how engine adapters implement the shared engine interface, barge-in and VAD semantics, and latency trade-offs for each mode.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/03-voice-engines-realtime-convai-pipeline-mode.md
- Generated: 2026-05-27T19:14:31.491Z
### Source Files
- `libraries/python/getpatter/engines/openai.py`
- `libraries/python/getpatter/engines/openai_realtime_2.py`
- `libraries/python/getpatter/engines/elevenlabs.py`
- `libraries/python/getpatter/stream_handler.py`
- `libraries/typescript/src/engines/`
- `libraries/typescript/src/stream-handler.ts`
- `libraries/typescript/src/pipeline-hooks.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [libraries/python/getpatter/engines/openai.py](libraries/python/getpatter/engines/openai.py)
- [libraries/python/getpatter/engines/openai_realtime_2.py](libraries/python/getpatter/engines/openai_realtime_2.py)
- [libraries/python/getpatter/engines/elevenlabs.py](libraries/python/getpatter/engines/elevenlabs.py)
- [libraries/python/getpatter/stream_handler.py](libraries/python/getpatter/stream_handler.py)
- [libraries/typescript/src/engines/openai.ts](libraries/typescript/src/engines/openai.ts)
- [libraries/typescript/src/engines/openai-2.ts](libraries/typescript/src/engines/openai-2.ts)
- [libraries/typescript/src/engines/elevenlabs.ts](libraries/typescript/src/engines/elevenlabs.ts)
- [libraries/typescript/src/stream-handler.ts](libraries/typescript/src/stream-handler.ts)
- [libraries/typescript/src/pipeline-hooks.ts](libraries/typescript/src/pipeline-hooks.ts)
</details>
# Voice Engines — Realtime, ConvAI & Pipeline Mode
Patter supports three distinct voice processing architectures, each represented by an engine marker class. The two speech-to-speech engines — **OpenAI Realtime** and **ElevenLabs ConvAI** — hand the full STT→LLM→TTS loop to a single hosted service, while **Pipeline mode** composes independent STT, LLM, and TTS providers under local control. Choosing an architecture determines latency characteristics, barge-in semantics, per-turn hook surface, and the degree to which each component can be swapped independently.
This page covers how engine marker classes are structured, how the `kind` discriminator drives per-call adapter selection, how each `StreamHandler` subclass handles audio, VAD-based barge-in, and turn completion, and what latency trade-offs each architecture entails.
---
## Engine Marker Classes
Every engine starts as a small, immutable configuration object — a **frozen dataclass** in Python or a **`readonly`-field class** in TypeScript — that carries credentials and tuning knobs. Its only behavioral method is the `kind` property, which serves as the stable discriminator used by the Patter server at call time to instantiate the correct adapter.
### OpenAI Realtime (`openai.Realtime` / `openai_realtime_2.Realtime2`)
Two markers exist for the two generations of the OpenAI Realtime API:
| Marker | `kind` | Default model |
|---|---|---|
| `openai.Realtime` | `"openai_realtime"` | `gpt-realtime-mini` |
| `openai_realtime_2.Realtime2` | `"openai_realtime_2"` | `gpt-realtime-2` |
Both markers expose the same tuneable fields:
- **`voice`** — voice preset (default: `alloy`).
- **`reasoning_effort`** / **`reasoningEffort`** — `"minimal" | "low" | "medium" | "high"`. OpenAI recommends `"low"` for production voice flows; higher tiers add measurable per-turn latency. Omitting the field leaves the server default.
- **`input_audio_transcription_model`** / **`inputAudioTranscriptionModel`** — override the Whisper model used for input transcription (e.g. `"gpt-realtime-whisper"` for low-latency partials, `"gpt-4o-transcribe"` for higher accuracy).
```python
# libraries/python/getpatter/engines/openai.py
engine = openai.Realtime(
model="gpt-realtime-2",
reasoning_effort="low",
input_audio_transcription_model="gpt-realtime-whisper",
)
```
```typescript
// libraries/typescript/src/engines/openai-2.ts
const engine = new Realtime2({ reasoningEffort: "low" });
```
> **Implementation note (2026-05):** Although two marker classes exist, both `"openai_realtime"` and `"openai_realtime_2"` route through the same `OpenAIRealtime2Adapter` at call start. OpenAI deprecated the Beta Realtime API in 2026-05, and the legacy `session.update` shape and `OpenAI-Beta: realtime=v1` header returned `invalid_model`. Only the default model string differs between the two markers.
Sources: [libraries/python/getpatter/engines/openai.py:10-59](libraries/python/getpatter/engines/openai.py), [libraries/python/getpatter/stream_handler.py:598-606](libraries/python/getpatter/stream_handler.py)
### ElevenLabs ConvAI (`elevenlabs.ConvAI`)
```python
# libraries/python/getpatter/engines/elevenlabs.py
engine = elevenlabs.ConvAI(api_key="...", agent_id="ag_...", voice="...")
```
The `ConvAI` marker requires both an API key (`ELEVENLABS_API_KEY`) and an **`agent_id`** (`ELEVENLABS_AGENT_ID`) — the pre-configured ElevenLabs Conversational AI agent. The agent ID encodes prompts, persona, and voice configuration managed in the ElevenLabs dashboard. The `kind` discriminator is `"elevenlabs_convai"`.
Sources: [libraries/python/getpatter/engines/elevenlabs.py:1-59](libraries/python/getpatter/engines/elevenlabs.py)
### Pipeline Mode (no marker)
Pipeline mode is selected when `engine=` is not one of the realtime markers. It uses the STT, LLM, and TTS configs on the `Agent` object directly. No marker class is required.
---
## StreamHandler Architecture
The `StreamHandler` abstract base class is the per-call controller that owns the AI adapter, audio routing, transcript history, metrics, guardrails, tool calling, and call control. The telephony handler (Twilio or Telnyx) creates the appropriate subclass after reading the engine `kind`.
```text
┌─────────────────────────────────────────────────────┐
│ StreamHandler (ABC) │
│ start() on_audio_received() cleanup() … │
└──────┬──────────────────┬─────────────────┬─────────┘
│ │ │
OpenAIRealtime ElevenLabs Pipeline
StreamHandler ConvAI StreamHandler
StreamHandler
(websocket to (websocket (local VAD +
OpenAI Realtime) to ConvAI) STT + LLM
+ TTS)
```
The three abstract methods every subclass must implement are `start()`, `on_audio_received()`, and `cleanup()`.
Sources: [libraries/python/getpatter/stream_handler.py:392-430](libraries/python/getpatter/stream_handler.py), [libraries/typescript/src/stream-handler.ts:238-260](libraries/typescript/src/stream-handler.ts)
---
## OpenAI Realtime Mode
### How it works
The `OpenAIRealtimeStreamHandler` opens a persistent WebSocket to the OpenAI Realtime API. All audio (inbound from telephony, outbound to telephony) flows through this single socket. OpenAI handles STT (Whisper), LLM response generation, and TTS in one end-to-end session — Patter never sees raw tokens or synthesized audio bytes from separate providers.
**Prewarm optimization:** At `start()`, the handler attempts to adopt a pre-opened `OpenAIRealtime2Adapter` WebSocket parked during the ringing window by `Patter._park_provider_connections`. A live parked socket skips the cold TCP+TLS+HTTP-101 handshake + `session.update` acknowledgment round-trip (~300–600 ms saved on the first audible word). If the parked socket is dead or absent the handler falls back to a fresh `connect()`.
Sources: [libraries/python/getpatter/stream_handler.py:662-720](libraries/python/getpatter/stream_handler.py)
### Event loop
The `_forward_events` coroutine runs as a background task consuming events from the adapter. Key events and their handling:
| Event | Handler action |
|---|---|
| `audio` | Forward bytes to `audio_sender.send_audio()`; record first-byte TTFB |
| `speech_started` | Send clear + cancel response (barge-in); emit user speech started |
| `speech_stopped` | Start turn latency timer; mark user transcript pending |
| `transcript_input` | Record STT complete; push `user` entry to history; request response |
| `transcript_output` | Accumulate agent text delta; check guardrails |
| `response_done` | Flush assistant turn (possibly buffered behind user transcript); record usage |
| `function_call` | Dispatch to tool executor or built-in `transfer_call`/`end_call` |
**Transcript ordering:** Because OpenAI Realtime emits the user's Whisper transcription *after* `response_done` (transcription runs in parallel with response generation), Patter buffers the assistant turn and flushes it only once the user transcript arrives. A `_REALTIME_USER_TRANSCRIPT_WAIT_S = 3.0` timeout ensures the assistant turn is eventually surfaced even if the transcript never arrives.
Sources: [libraries/python/getpatter/stream_handler.py:703-740](libraries/python/getpatter/stream_handler.py), [libraries/python/getpatter/stream_handler.py:557-562](libraries/python/getpatter/stream_handler.py)
### Barge-in and VAD semantics
OpenAI's server-side VAD fires a `speech_started` event when it detects user speech during the agent's turn. The handler responds immediately with `send_clear()` + `cancel_response()`. However, on PSTN lines without acoustic echo cancellation (AEC), TTS bleed into the microphone can trigger phantom `speech_started` events.
To suppress early self-cancellation, the handler enforces a minimum elapsed time between the agent's first audio chunk and an allowed barge-in:
```python
# libraries/python/getpatter/stream_handler.py
MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_AEC = 1.0 # AEC warmup window
MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_NO_AEC = 0.5 # raised from 0.1 s in 0.6.2
```
The gate is anchored to `_current_response_first_audio_at` on the adapter, so the window runs from the first wire-time audio chunk rather than from the `beginSpeaking` timestamp (which precedes TTS TTFB by 200–700 ms for cloud TTS providers).
Sources: [libraries/python/getpatter/stream_handler.py:56-68](libraries/python/getpatter/stream_handler.py), [libraries/python/getpatter/stream_handler.py:1348-1370](libraries/python/getpatter/stream_handler.py)
---
## ElevenLabs ConvAI Mode
### How it works
`ElevenLabsConvAIStreamHandler` opens a WebSocket to the ElevenLabs ConvAI endpoint identified by `agent_id`. Like OpenAI Realtime, this is a fully-baked speech-to-speech path where ElevenLabs internally manages STT, LLM inference, and TTS. Patter sees streamed audio chunks and transcript events, but has no token-level visibility into the LLM response.
**Audio transcoding:** Twilio delivers μ-law 8 kHz audio. By default, the handler decodes it to PCM16 and resamples to 16 kHz before forwarding to ConvAI. When ConvAI negotiates `ulaw_8000` on its input side, a native μ-law fast-path (`_native_mulaw_8k`) bypasses the decode+resample entirely.
### Event loop and barge-in
The ConvAI event loop is simpler than the Realtime path because ElevenLabs manages all turn-taking internally. The `interruption` event from the ConvAI adapter is the canonical barge-in signal:
```python
elif ev_type == "interruption":
await self.audio_sender.send_clear()
if self.metrics is not None:
self.metrics.record_turn_interrupted()
waiting_first_audio = False
current_agent_text = ""
```
Unlike the Realtime path, there is no Patter-side barge-in gate — barge-in detection and suppression happen entirely inside ElevenLabs. The SDK cannot configure the VAD sensitivity or gate durations for ConvAI.
Sources: [libraries/python/getpatter/stream_handler.py:1780-1800](libraries/python/getpatter/stream_handler.py)
---
## Pipeline Mode (STT → LLM → TTS)
### How it works
`PipelineStreamHandler` composes three independently-configured providers. The telephony WebSocket delivers audio → a local VAD and STT adapter transcribe it → an LLM loop generates a text response → a TTS adapter synthesizes audio → it is sent back.
```
Telephony audio (mulaw 8kHz)
│
▼
[Decode + resample → PCM16 16kHz]
│
▼
[VAD] ──speech_start──► [STT (Deepgram / Whisper / Cartesia / …)]
│ transcript
▼
[LLM (OpenAI / Anthropic / Groq / …)]
│ token stream
▼
[SentenceChunker]
│ sentence
▼
[TTS (ElevenLabs / OpenAI / Cartesia / …)]
│ audio chunks
▼
[Encode → mulaw 8kHz]
│
▼
Telephony audio out
```
At `start()`, the handler initializes STT and TTS from `agent.stt` / `agent.tts` config objects, falling back to `deepgram_key` for STT and `elevenlabs_key` for TTS when explicit adapters are not provided. It also auto-loads SileroVAD (when `onnxruntime-node`/`onnxruntime` is available) unless `agent.vad` is set explicitly.
Sources: [libraries/python/getpatter/stream_handler.py:2050-2070](libraries/python/getpatter/stream_handler.py)
### Pipeline hooks
Pipeline mode exposes a rich hook surface via `PipelineHookExecutor`. Every hook is **fail-open**: exceptions are logged and the original value passes through unchanged so a broken hook never kills a call.
| Hook | Stage | Tier |
|---|---|---|
| `beforeSendToStt` | Pre-STT audio | Drop (return `null`) or pass through |
| `afterTranscribe` | Post-STT transcript | Modify or veto the transcript |
| `beforeLlm` | Pre-LLM messages | Modify the messages array |
| `afterLlm.onChunk` | Per-LLM-token | Synchronous, ~0 ms budget |
| `afterLlm.onSentence` | Per-sentence | Async rewrite or drop |
| `afterLlm.onResponse` | Full response | Async rewrite (requires buffering) |
| `beforeSynthesize` | Pre-TTS text | Modify or veto the sentence |
| `afterSynthesize` | Post-TTS audio | Modify or drop the audio chunk |
The `afterLlm` hook is normalized from a legacy `(text, ctx) => string` callable or a new three-tier `AfterLLMHook` object. Only `onResponse` requires the LLM loop to buffer the full stream before proceeding.
Sources: [libraries/typescript/src/pipeline-hooks.ts:1-50](libraries/typescript/src/pipeline-hooks.ts), [libraries/typescript/src/pipeline-hooks.ts:101-222](libraries/typescript/src/pipeline-hooks.ts)
### Barge-in and VAD semantics
Pipeline mode implements barge-in entirely in Patter. When the local VAD fires `speech_start` while the agent is speaking, the handler consults optional **barge-in confirmation strategies** before canceling:
- With no strategies configured (default), the first `speech_start` triggers immediate cancel of STT streaming, LLM consumption (`_llm_cancel_event` / `llmAbort`), and TTS synthesis.
- With one or more strategies, barge-in enters a **pending** state — TTS continues streaming naturally — and each incoming STT transcript is passed to the strategies. The first strategy that approves confirms the barge-in; if none confirm within `barge_in_confirm_ms` (default 1500 ms) the pending state is dropped.
The same AEC-vs-no-AEC gate constants apply as in the Realtime path:
```python
# stream_handler.py (Python) / stream-handler.ts (TypeScript)
MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_AEC = 1.0 # covers AEC convergence window
MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_NO_AEC = 0.5 # anti-phantom-VAD on PSTN
```
The gate is anchored to `_first_audio_sent_at` (the instant the first audio chunk actually reached the carrier wire), not to `_speaking_started_at`, so slow-TTFB TTS providers do not leave the gate expired before audio goes out.
**Inbound audio ring buffer:** While the agent is speaking and the self-hearing guard is dropping inbound audio, up to ~250–600 ms of PCM16 16 kHz frames are kept in a ring buffer (`_inbound_audio_ring`). On confirmed barge-in, this buffer is flushed to STT so the user's leading speech — missed while the VAD's `minSpeechDuration` window was accumulating — is recovered and transcribed.
Sources: [libraries/python/getpatter/stream_handler.py:56-68](libraries/python/getpatter/stream_handler.py), [libraries/typescript/src/stream-handler.ts:303-310](libraries/typescript/src/stream-handler.ts), [libraries/typescript/src/stream-handler.ts:330-360](libraries/typescript/src/stream-handler.ts)
---
## STT Hallucination Filtering
All three modes share a filter against known Whisper and Deepgram hallucinations on silence or TTS echo. When a STT transcript matches any entry in `_STT_HALLUCINATIONS` (Python) / `HALLUCINATIONS` (TypeScript) after lower-casing and stripping punctuation, the turn is dropped entirely rather than passed to the LLM. This prevents PSTN echo loopback from producing phantom "thank you for watching" user turns that trigger spurious LLM responses.
```python
# libraries/python/getpatter/stream_handler.py
_STT_HALLUCINATIONS: frozenset[str] = frozenset({
"you", "thank you", "thanks", "yeah", "yes", "no", "okay", "ok",
"uh", "um", "mmm", "hmm", ".", "bye", "right", "cool",
"thank you for watching", "thanks for watching", "[music]", "[silence]", ...
})
```
Sources: [libraries/python/getpatter/stream_handler.py:74-110](libraries/python/getpatter/stream_handler.py), [libraries/typescript/src/stream-handler.ts:175-200](libraries/typescript/src/stream-handler.ts)
---
## Built-in Tool Injection
Both Realtime and Pipeline modes inject the `transfer_call` and `end_call` built-in tools into every session so the LLM can initiate a call transfer or hang up regardless of system-prompt instructions. In Realtime mode the tools appear in the `session.update` sent by `OpenAIRealtimeStreamHandler.start()`; in Pipeline mode they are appended by `_augment_with_builtin_handoff_tools()` / `augmentWithBuiltinHandoffTools()`, wiring handler closures that call the telephony-level `_transfer_fn` / `_hangup_fn`. This ensures parity between modes.
Sources: [libraries/python/getpatter/stream_handler.py:147-190](libraries/python/getpatter/stream_handler.py), [libraries/typescript/src/stream-handler.ts:120-158](libraries/typescript/src/stream-handler.ts)
---
## Latency Trade-offs
| | OpenAI Realtime | ElevenLabs ConvAI | Pipeline |
|---|---|---|---|
| **Architecture** | Speech-to-speech (single WS) | Speech-to-speech (single WS) | STT + LLM + TTS (3 services) |
| **Latency** | Lowest — one model, one hop | Low — ElevenLabs managed | Higher — three sequential hops |
| **Prewarm** | WS parked during ringing (~300–600 ms saved) | WS parked during ringing | STT WS + first-message audio prewarmed |
| **Barge-in control** | Patter gate + server VAD | ElevenLabs managed | Full local control via strategies |
| **LLM provider** | OpenAI only | ElevenLabs-configured | Any (OpenAI, Anthropic, Groq, …) |
| **Per-turn hooks** | Guardrails only | Guardrails only | Full hook surface (7 stages) |
| **Reasoning effort** | `reasoning_effort` knob | N/A | Per-provider model selection |
| **Transparency** | Transcript + tool events | Transcript events | Full token stream, audio chunks |
The `reasoning_effort` field on OpenAI Realtime markers directly controls per-turn latency on `gpt-realtime-2`; the docs consistently recommend `"low"` for production flows because `"medium"` and `"high"` add measurable latency per turn with diminishing returns on voice tasks.
---
## Summary
Patter's three voice engine modes are distinguished at construction time by frozen engine marker objects whose `kind` property selects the correct `StreamHandler` subclass at call start. The two speech-to-speech engines (OpenAI Realtime and ElevenLabs ConvAI) minimize latency by collapsing STT, LLM, and TTS into a single vendor-managed WebSocket session, while Pipeline mode pays extra round-trip latency in exchange for full provider choice, rich per-stage hooks, and local VAD/barge-in strategy control. All three modes share the same barge-in gate constants, STT hallucination filter, built-in tool injection, and speech-event observability surface, making it straightforward to switch architectures without reworking application logic.
Sources: [libraries/python/getpatter/stream_handler.py:380-430](libraries/python/getpatter/stream_handler.py)
---
## 04. STT & TTS Provider Catalog
> All supported speech-to-text (Deepgram, Whisper, OpenAI Transcribe, AssemblyAI, Cartesia, Soniox, Speechmatics) and text-to-speech (ElevenLabs, Cartesia, OpenAI, LMNT, Rime, Inworld) adapters — their configuration, streaming contracts, known limitations, and how to swap providers in pipeline mode.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/04-stt-tts-provider-catalog.md
- Generated: 2026-05-27T19:16:26.163Z
### Source Files
- `libraries/python/getpatter/stt/deepgram.py`
- `libraries/python/getpatter/stt/openai_transcribe.py`
- `libraries/python/getpatter/stt/whisper.py`
- `libraries/python/getpatter/tts/elevenlabs.py`
- `libraries/python/getpatter/tts/cartesia.py`
- `libraries/python/getpatter/tts/openai.py`
- `libraries/python/getpatter/providers/base.py`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [libraries/python/getpatter/providers/base.py](libraries/python/getpatter/providers/base.py)
- [libraries/python/getpatter/stt/deepgram.py](libraries/python/getpatter/stt/deepgram.py)
- [libraries/python/getpatter/stt/assemblyai.py](libraries/python/getpatter/stt/assemblyai.py)
- [libraries/python/getpatter/stt/cartesia.py](libraries/python/getpatter/stt/cartesia.py)
- [libraries/python/getpatter/stt/openai_transcribe.py](libraries/python/getpatter/stt/openai_transcribe.py)
- [libraries/python/getpatter/stt/whisper.py](libraries/python/getpatter/stt/whisper.py)
- [libraries/python/getpatter/stt/soniox.py](libraries/python/getpatter/stt/soniox.py)
- [libraries/python/getpatter/stt/speechmatics.py](libraries/python/getpatter/stt/speechmatics.py)
- [libraries/python/getpatter/tts/elevenlabs.py](libraries/python/getpatter/tts/elevenlabs.py)
- [libraries/python/getpatter/tts/cartesia.py](libraries/python/getpatter/tts/cartesia.py)
- [libraries/python/getpatter/tts/openai.py](libraries/python/getpatter/tts/openai.py)
- [libraries/python/getpatter/tts/lmnt.py](libraries/python/getpatter/tts/lmnt.py)
- [libraries/python/getpatter/tts/rime.py](libraries/python/getpatter/tts/rime.py)
- [libraries/python/getpatter/tts/inworld.py](libraries/python/getpatter/tts/inworld.py)
- [libraries/python/getpatter/providers/deepgram_stt.py](libraries/python/getpatter/providers/deepgram_stt.py)
- [libraries/python/getpatter/providers/assemblyai_stt.py](libraries/python/getpatter/providers/assemblyai_stt.py)
- [libraries/python/getpatter/providers/cartesia_stt.py](libraries/python/getpatter/providers/cartesia_stt.py)
- [libraries/python/getpatter/providers/openai_transcribe_stt.py](libraries/python/getpatter/providers/openai_transcribe_stt.py)
- [libraries/python/getpatter/providers/soniox_stt.py](libraries/python/getpatter/providers/soniox_stt.py)
- [libraries/python/getpatter/providers/speechmatics_stt.py](libraries/python/getpatter/providers/speechmatics_stt.py)
- [libraries/python/getpatter/providers/elevenlabs_ws_tts.py](libraries/python/getpatter/providers/elevenlabs_ws_tts.py)
</details>
# STT & TTS Provider Catalog
Patter's voice pipeline is built around a pair of abstract interfaces — `STTProvider` and `TTSProvider` — that standardize the contract for any speech-to-text or text-to-speech backend. Concrete adapters wrap each vendor API behind this common interface, so swapping providers is a one-line constructor change at the call-site. All adapters live under `libraries/python/getpatter/stt/` and `libraries/python/getpatter/tts/`; the underlying implementations are in `libraries/python/getpatter/providers/`.
This page catalogs every supported STT and TTS adapter: its configuration parameters, transport model, streaming contract, telephony shortcuts, and known constraints. It also explains how the base interfaces are structured and how to swap providers inside pipeline mode.
---
## Base Interface Contract
Both families share a common lifecycle defined in `providers/base.py`:
```text
STTProvider TTSProvider
──────────────────────────────── ─────────────────────────────
connect() → None synthesize(text) → AsyncIterator[bytes]
send_audio(chunk: bytes) → None close() → None
receive_transcripts() → AsyncIterator[Transcript]
close() → None
warmup() → None (optional, best-effort)
```
`warmup()` is a no-op by default. When `prewarm=True` (the agent default), Patter calls `warmup()` once per outbound call before the carrier reports `answered`, pre-heating DNS, TLS, and provider-edge state to save 200–500 ms of first-turn latency. Failures are swallowed and logged at DEBUG — the live call always proceeds.
`Transcript` is the normalized STT output type:
| Field | Type | Meaning |
|---|---|---|
| `text` | `str` | Transcribed text (stripped) |
| `is_final` | `bool` | Stable utterance, not a partial |
| `confidence` | `float` | Per-utterance confidence in `[0.0, 1.0]` |
| `speech_final` | `bool` | Faster VAD end-of-utterance hint (Deepgram) |
| `from_finalize` | `bool` | Result was triggered by a `Finalize` control frame |
| `event_type` | `Literal["Results", "UtteranceEnd", "SpeechStarted"]` | Event kind |
| `words` | `list[dict]` | Optional per-word timings/metadata |
| `request_id` | `str \| None` | Provider-side trace ID for cost reconciliation |
Sources: [libraries/python/getpatter/providers/base.py:18-61](libraries/python/getpatter/providers/base.py)
---
## STT Providers
### Provider Summary
| Adapter module | `provider_key` | Transport | Default model | Default sample rate | Env var |
|---|---|---|---|---|---|
| `stt/deepgram.py` | `deepgram` | WebSocket (persistent) | `nova-3` | 16 kHz | `DEEPGRAM_API_KEY` |
| `stt/assemblyai.py` | `assemblyai` | WebSocket (persistent) | `universal-streaming-english` | 16 kHz | `ASSEMBLYAI_API_KEY` |
| `stt/cartesia.py` | `cartesia_stt` | WebSocket (persistent) | `ink-whisper` | 16 kHz | `CARTESIA_API_KEY` |
| `stt/openai_transcribe.py` | `openai_transcribe` | HTTP POST (buffered) | `gpt-4o-transcribe` | 16 kHz | `OPENAI_API_KEY` |
| `stt/whisper.py` | `whisper` | HTTP POST (buffered) | `whisper-1` | 16 kHz | `OPENAI_API_KEY` |
| `stt/soniox.py` | `soniox` | WebSocket (persistent) | `stt-rt-v4` | 16 kHz | `SONIOX_API_KEY` |
| `stt/speechmatics.py` | _(no provider_key)_ | SDK WebSocket | `adaptive` turn detection | 16 kHz | `SPEECHMATICS_API_KEY` |
---
### Deepgram
**Transport:** Persistent WebSocket to `wss://api.deepgram.com/v1/listen`.
Deepgram is the most feature-complete streaming adapter. It maintains a KeepAlive pump (JSON `{"type":"KeepAlive"}` every 4 s) to prevent the server closing idle sessions after ~10 s. On speech end, the SDK sends a `Finalize` control frame to force an immediate final transcript rather than waiting for Deepgram's own `utterance_end_ms` heuristic (~1 s). Graceful teardown uses the `Finalize → drain 100 ms → CloseStream` sequence.
The adapter emits three `event_type` values: `Results` (normal transcripts), `SpeechStarted` (VAD), and `UtteranceEnd` (VAD). Both `is_final` and `speech_final` are surfaced so callers can gate on either signal independently.
`smart_format` (punctuation and numeral normalization) defaults to `False` in the implementation because it adds 50–150 ms to TTFT per transcript and is rarely useful for LLM pipelines — pass `smart_format=True` to opt in for human-visible transcripts.
```python
from getpatter.stt import deepgram
stt = deepgram.STT() # reads DEEPGRAM_API_KEY
stt = deepgram.STT(api_key="dg_...", endpointing_ms=80)
stt_twilio = deepgram.STT.for_twilio(api_key="dg_...") # mulaw, 8 kHz
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `nova-3` | Also `nova-2`, `nova-2-phonecall`, `enhanced`, `base` |
| `encoding` | `linear16` | Also `mulaw`, `alaw`, `opus`, `flac` |
| `sample_rate` | `16000` | 8000, 16000, 24000, 44100, 48000 |
| `endpointing_ms` | `150` | Silence wait before endpoint decision |
| `utterance_end_ms` | `1000` | Hard minimum on Deepgram; min 1000 enforced |
| `smart_format` | `True` (pipeline wrapper) / `False` (provider) | Punctuation/numeral formatting |
| `interim_results` | `True` | Emit partial transcripts |
| `vad_events` | `True` | Emit `SpeechStarted` / `UtteranceEnd` frames |
Sources: [libraries/python/getpatter/providers/deepgram_stt.py:54-102](libraries/python/getpatter/providers/deepgram_stt.py), [libraries/python/getpatter/stt/deepgram.py:24-59](libraries/python/getpatter/stt/deepgram.py)
---
### AssemblyAI
**Transport:** Persistent WebSocket to `wss://streaming.assemblyai.com/v3/ws` (pure `aiohttp`, no vendor SDK).
AssemblyAI's adapter implements the v3 streaming protocol with coalescing buffering: because Twilio emits 20 ms frames (below AssemblyAI's 50 ms floor), frames are accumulated into a ~60 ms buffer before being forwarded. Sending frames below 50 ms triggers server error 3007 and stream closure.
Reconnect logic handles transient close codes 3005 and 3008 with one automatic retry. The adapter supports mid-session `UpdateConfiguration` (raise `min_turn_silence` while collecting digit strings) and `ForceEndpoint` (barge-in).
The `language` constructor argument is currently ignored — language behavior is driven by the `model` kwarg; a warning is emitted when a non-default value is supplied.
```python
from getpatter.stt import assemblyai
stt = assemblyai.STT()
stt = assemblyai.STT(api_key="...", model="universal-streaming-multilingual")
stt_twilio = assemblyai.STT.for_twilio(api_key="...") # pcm_mulaw, 8 kHz
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `universal-streaming-english` | Also `universal-streaming-multilingual`, `u3-rt-pro`, `whisper-rt` |
| `encoding` | `pcm_s16le` | Also `pcm_mulaw` |
| `sample_rate` | `16000` | 8000 or 16000 |
| `language` | `"en"` | Ignored — drives a warning if non-default |
Sources: [libraries/python/getpatter/providers/assemblyai_stt.py:58-110](libraries/python/getpatter/providers/assemblyai_stt.py), [libraries/python/getpatter/providers/assemblyai_stt.py:270-330](libraries/python/getpatter/providers/assemblyai_stt.py)
---
### Cartesia (STT)
**Transport:** Persistent WebSocket to `wss://api.cartesia.ai/stt/websocket` (pure `aiohttp`).
Cartesia's `ink-whisper` STT adapter emits interim and final transcripts via the `transcript` event type. A `finalize` text frame forces immediate utterance finalization — wired to the SDK's VAD `speech_end` event to convert Cartesia's otherwise conservative silence-based heuristic (2–7 s on PSTN audio) into a fast VAD-driven one. The keepalive loop sends WebSocket pings every 30 s.
The adapter supports connection parking: `open_parked_connection()` pre-opens a WS during the carrier ringing window; `adopt_websocket()` adopts it at call pickup, eliminating the TLS+WS-upgrade round-trip (~150–400 ms) on the first turn.
```python
from getpatter.stt import cartesia
stt = cartesia.STT() # reads CARTESIA_API_KEY
stt = cartesia.STT(api_key="...", language="es")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `ink-whisper` | Only currently supported model |
| `encoding` | `pcm_s16le` | Only supported encoding |
| `sample_rate` | `16000` | 8000, 16000, 24000, 44100, 48000 |
| `language` | `"en"` | BCP-47 code |
Sources: [libraries/python/getpatter/providers/cartesia_stt.py:50-80](libraries/python/getpatter/providers/cartesia_stt.py), [libraries/python/getpatter/providers/cartesia_stt.py:200-240](libraries/python/getpatter/providers/cartesia_stt.py)
---
### OpenAI Transcribe (GPT-4o)
**Transport:** Buffered HTTP POST to OpenAI's `/v1/audio/transcriptions` endpoint.
`OpenAITranscribeSTT` subclasses `WhisperSTT` and reuses its buffering + transcription logic; the only differences are the default model (`gpt-4o-transcribe`) and an accepted-model whitelist that rejects `whisper-1`. Described in the source as "~10x faster than Whisper-1" for latency-sensitive pipelines.
```python
from getpatter.stt import openai_transcribe
stt = openai_transcribe.STT() # reads OPENAI_API_KEY
stt = openai_transcribe.STT(language="it")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `gpt-4o-transcribe` | Also `gpt-4o-mini-transcribe`; `whisper-1` rejected |
| `language` | `"en"` | BCP-47 language hint |
Sources: [libraries/python/getpatter/providers/openai_transcribe_stt.py:1-60](libraries/python/getpatter/providers/openai_transcribe_stt.py), [libraries/python/getpatter/stt/openai_transcribe.py:18-40](libraries/python/getpatter/stt/openai_transcribe.py)
---
### Whisper (whisper-1)
**Transport:** Buffered HTTP POST (same endpoint as OpenAI Transcribe).
The original Whisper adapter buffers incoming PCM audio across the call turn and submits it as a single POST request when the utterance ends. Higher latency than streaming WebSocket providers; use `openai_transcribe.STT` for production pipelines unless you specifically need `whisper-1`.
```python
from getpatter.stt import whisper
stt = whisper.STT() # reads OPENAI_API_KEY
stt = whisper.STT(language="it")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `whisper-1` | OpenAI Whisper v1 |
| `language` | `"en"` | BCP-47 hint |
Sources: [libraries/python/getpatter/stt/whisper.py:1-42](libraries/python/getpatter/stt/whisper.py)
---
### Soniox
**Transport:** Persistent WebSocket to `wss://stt-rt.soniox.com/transcribe-websocket`.
Soniox operates on a token-level streaming protocol: `is_final` tokens are accumulated into segments and flushed when an `<end>` / `<fin>` endpoint token arrives. The adapter supports automatic language identification alongside language hints and optional speaker diarization. Model `stt-rt-v4` is the current default.
```python
from getpatter.stt import soniox
stt = soniox.STT() # reads SONIOX_API_KEY
stt = soniox.STT(language_hints=["en", "it"])
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `stt-rt-v4` | Also `stt-rt-v3`, `stt-rt-v2` |
| `language_hints` | `None` | List of BCP-47 hints for auto-detection |
| `language_hints_strict` | `False` | Restrict detection to hints |
| `sample_rate` | `16000` | 8000, 16000, 24000 |
| `enable_speaker_diarization` | `False` | |
| `enable_language_identification` | `True` | |
| `max_endpoint_delay_ms` | `500` | |
Sources: [libraries/python/getpatter/stt/soniox.py:1-56](libraries/python/getpatter/stt/soniox.py), [libraries/python/getpatter/providers/soniox_stt.py:1-55](libraries/python/getpatter/providers/soniox_stt.py)
---
### Speechmatics
**Transport:** SDK WebSocket via `speechmatics-voice[smart]` (optional dependency; lazy import).
Speechmatics is the only STT adapter that depends on a vendor SDK rather than a bare `aiohttp`/`websockets` transport. The dependency is imported lazily so users who don't install the `speechmatics` extra can still import other Patter components.
Turn detection is configurable with four modes: `ADAPTIVE` (default), `FIXED`, `EXTERNAL`, and `SMART_TURN`. The adapter supports speaker diarization and partial transcripts.
```python
from getpatter.stt import speechmatics
from getpatter.stt.speechmatics import TurnDetectionMode
stt = speechmatics.STT() # reads SPEECHMATICS_API_KEY
stt = speechmatics.STT(turn_detection_mode=TurnDetectionMode.SMART_TURN)
```
**Install the optional dependency:**
```
pip install 'getpatter[speechmatics]'
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `language` | `"en"` | BCP-47 |
| `turn_detection_mode` | `ADAPTIVE` | `EXTERNAL`, `FIXED`, `ADAPTIVE`, `SMART_TURN` |
| `sample_rate` | `16000` | 8000, 16000, 44100 |
| `enable_diarization` | `False` | |
| `include_partials` | `True` | Emit interim transcripts |
Sources: [libraries/python/getpatter/stt/speechmatics.py:1-52](libraries/python/getpatter/stt/speechmatics.py), [libraries/python/getpatter/providers/speechmatics_stt.py:1-80](libraries/python/getpatter/providers/speechmatics_stt.py)
---
## TTS Providers
### Provider Summary
| Adapter module | `provider_key` | Transport | Default model | Default sample rate | Env var |
|---|---|---|---|---|---|
| `tts/elevenlabs.py` | `elevenlabs_ws` | WebSocket streaming per utterance | `eleven_flash_v2_5` | PCM 16 kHz | `ELEVENLABS_API_KEY` |
| `tts/cartesia.py` | `cartesia_tts` | HTTP streaming | `sonic-3` | PCM 16 kHz | `CARTESIA_API_KEY` |
| `tts/openai.py` | `openai_tts` | HTTP streaming | `tts-1` | PCM 16 kHz | `OPENAI_API_KEY` |
| `tts/lmnt.py` | `lmnt` | HTTP streaming | `blizzard` | PCM 16 kHz | `LMNT_API_KEY` |
| `tts/rime.py` | `rime` | HTTP streaming | `arcana` | PCM 16 kHz | `RIME_API_KEY` |
| `tts/inworld.py` | `inworld` | HTTP NDJSON streaming | `inworld-tts-2` | PCM 16 kHz | `INWORLD_API_KEY` |
---
### ElevenLabs
**Transport:** WebSocket streaming-input endpoint (`/v1/text-to-speech/{voice_id}/stream-input`), one WS per utterance. Saves ~50 ms/request vs the legacy HTTP REST endpoint by removing per-request setup time.
`auto_mode=True` (default) delegates chunk scheduling to ElevenLabs. `chunk_length_schedule` is accepted on the constructor but only takes effect when `auto_mode=False`. The `eleven_v3` model is **not** supported by the WS endpoint — use the HTTP REST variant (`ElevenLabsRestTTS`) for v3.
The `output_format` field is intentionally omitted from the default constructor path so the internal `_output_format_explicit` flag remains `False`, allowing `set_telephony_carrier()` to flip the format automatically from `pcm_16000` to `ulaw_8000` at call time when Twilio is detected (avoiding client-side resampling and the associated audio quality issue).
```python
from getpatter.tts import elevenlabs
tts = elevenlabs.TTS() # reads ELEVENLABS_API_KEY
tts = elevenlabs.TTS(voice_id="EXAVITQu4vr4xnSDxMaL", model_id="eleven_flash_v2_5")
tts_twilio = elevenlabs.TTS.for_twilio(api_key="...") # ulaw_8000
tts_telnyx = elevenlabs.TTS.for_telnyx(api_key="...") # pcm_16000
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `voice_id` | `EXAVITQu4vr4xnSDxMaL` | Katie (default); any ElevenLabs voice ID |
| `model_id` | `eleven_flash_v2_5` | Also `eleven_multilingual_v2`; NOT `eleven_v3` |
| `output_format` | _(carrier-derived)_ | `pcm_16000`, `ulaw_8000`, etc. |
| `language_code` | `None` | For multilingual models |
| `auto_mode` | `True` | ElevenLabs-managed chunking |
| `chunk_length_schedule` | `None` | Manual chunk scheduling (requires `auto_mode=False`) |
Sources: [libraries/python/getpatter/tts/elevenlabs.py:1-130](libraries/python/getpatter/tts/elevenlabs.py), [libraries/python/getpatter/providers/elevenlabs_ws_tts.py:1-80](libraries/python/getpatter/providers/elevenlabs_ws_tts.py)
---
### Cartesia (TTS)
**Transport:** HTTP streaming, ~90 ms TTFB quoted in the source docstring.
Default model is `sonic-3` (Cartesia's current GA model). Voice IDs from the prior `sonic-2` family remain compatible. Audio is returned as raw PCM bytes; `sample_rate` controls the output rate directly, making telephony configuration trivial: pass `sample_rate=8000` for Twilio (PCM 8 kHz) or leave `sample_rate=16000` for Telnyx.
```python
from getpatter.tts import cartesia
tts = cartesia.TTS() # reads CARTESIA_API_KEY
tts = cartesia.TTS(voice="f786b574-...", speed=1.2)
tts_twilio = cartesia.TTS.for_twilio() # PCM 8 kHz
tts_telnyx = cartesia.TTS.for_telnyx() # PCM 16 kHz
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `sonic-3` | GA model; `sonic-2` voice IDs still work |
| `voice` | `f786b574-daa5-4673-aa0c-cbe3e8534c02` | Katie (default) |
| `language` | `"en"` | BCP-47 |
| `sample_rate` | `16000` | 8000 or 16000 |
| `speed` | `None` | Optional string or float |
Sources: [libraries/python/getpatter/tts/cartesia.py:1-105](libraries/python/getpatter/tts/cartesia.py)
---
### OpenAI TTS
**Transport:** HTTP streaming via OpenAI's TTS endpoint.
Two model tiers: `tts-1` (default, lower latency) and `tts-1-hd` (higher quality). Six built-in voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.
```python
from getpatter.tts import openai
tts = openai.TTS() # reads OPENAI_API_KEY
tts = openai.TTS(voice="nova", model="tts-1-hd")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `tts-1` | Also `tts-1-hd` |
| `voice` | `alloy` | `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer` |
Sources: [libraries/python/getpatter/tts/openai.py:1-38](libraries/python/getpatter/tts/openai.py)
---
### LMNT
**Transport:** HTTP streaming.
Uses the `blizzard` model by default and the `leah` voice. Raw PCM is returned via the `format="raw"` default. Language can be explicitly set or left `None` for auto-detection.
```python
from getpatter.tts import lmnt
tts = lmnt.TTS() # reads LMNT_API_KEY
tts = lmnt.TTS(voice="leah", model="blizzard")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `blizzard` | |
| `voice` | `leah` | |
| `language` | `None` | Optional BCP-47 |
| `format` | `raw` | Raw PCM output |
| `sample_rate` | `16000` | |
Sources: [libraries/python/getpatter/tts/lmnt.py:1-46](libraries/python/getpatter/tts/lmnt.py)
---
### Rime
**Transport:** HTTP streaming (Arcana / Mist model family).
```python
from getpatter.tts import rime
tts = rime.TTS() # reads RIME_API_KEY
tts = rime.TTS(speaker="astra", model="arcana")
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `arcana` | Also `mist` |
| `speaker` | `None` | Optional speaker ID |
| `lang` | `"eng"` | ISO 639-3 language code |
| `sample_rate` | `16000` | |
Sources: [libraries/python/getpatter/tts/rime.py:1-42](libraries/python/getpatter/tts/rime.py)
---
### Inworld
**Transport:** HTTP NDJSON streaming (`inworld-tts-2` model).
Inworld's adapter accepts a richer set of generation controls than other providers, including `temperature`, `speaking_rate`, and `delivery_mode`. Authentication uses `auth_token` (mapped from the `api_key` kwarg). The `audio_encoding` defaults to `PCM`.
```python
from getpatter.tts import inworld
tts = inworld.TTS() # reads INWORLD_API_KEY
tts = inworld.TTS(voice="Olivia", temperature=0.8, speaking_rate=1.1)
```
**Key parameters:**
| Parameter | Default | Notes |
|---|---|---|
| `model` | `inworld-tts-2` | |
| `voice` | `Ashley` | |
| `language` | `None` | Optional BCP-47 |
| `audio_encoding` | `PCM` | |
| `sample_rate` | `16000` | |
| `bitrate` | `64000` | |
| `temperature` | `None` | Generation temperature |
| `speaking_rate` | `1.0` | |
| `delivery_mode` | `None` | |
Sources: [libraries/python/getpatter/tts/inworld.py:1-55](libraries/python/getpatter/tts/inworld.py)
---
## Streaming Contracts and Transport Comparison
```text
STT Adapters — transport at a glance
──────────────────────────────────────────────────────────────────────────────
Provider Transport Interim results Finalize control Warmup
──────────────────────────────────────────────────────────────────────────────
Deepgram WS (websockets) Yes Finalize + CloseStream Yes
AssemblyAI WS (aiohttp) Yes ForceEndpoint + Terminate Yes
Cartesia WS (aiohttp) Yes finalize text frame Yes
Soniox WS (aiohttp) Yes (token-level) <end>/<fin> tokens Yes
Speechmatics SDK WS Yes External / adaptive Yes
Whisper HTTP POST No (batch) N/A No
OpenAI Transcribe HTTP POST No (batch) N/A No
──────────────────────────────────────────────────────────────────────────────
```
HTTP-backed adapters (Whisper, OpenAI Transcribe) buffer audio for the entire utterance and submit one POST request at end of turn. They do not produce interim results and have higher inherent latency. Streaming WebSocket adapters emit interim transcripts in real time, enabling faster barge-in detection.
All WebSocket adapters implement `warmup()` to pre-open a socket during the carrier ringing window. Only Cartesia STT additionally supports connection parking (`open_parked_connection` / `adopt_websocket`), which eliminates the full WS handshake on the first call turn rather than just pre-heating DNS/TLS.
Sources: [libraries/python/getpatter/providers/deepgram_stt.py:130-160](libraries/python/getpatter/providers/deepgram_stt.py), [libraries/python/getpatter/providers/assemblyai_stt.py:185-240](libraries/python/getpatter/providers/assemblyai_stt.py), [libraries/python/getpatter/providers/cartesia_stt.py:165-200](libraries/python/getpatter/providers/cartesia_stt.py)
---
## Telephony Shortcuts (`for_twilio` / `for_telnyx`)
Several adapters expose class-method shortcuts that pre-configure the right encoding and sample rate for telephony carriers, avoiding client-side resampling:
| Adapter | `for_twilio()` | `for_telnyx()` |
|---|---|---|
| `stt/deepgram` | mulaw, 8 kHz | — |
| `stt/assemblyai` | pcm_mulaw, 8 kHz | — |
| `tts/elevenlabs` | `ulaw_8000` output format | `pcm_16000` output format |
| `tts/cartesia` | PCM, 8 kHz | PCM, 16 kHz |
For ElevenLabs, `for_twilio()` explicitly sets `output_format="ulaw_8000"` and marks the format as caller-explicit so the carrier auto-flip hook does not override it at call time.
Sources: [libraries/python/getpatter/tts/elevenlabs.py:89-130](libraries/python/getpatter/tts/elevenlabs.py), [libraries/python/getpatter/tts/cartesia.py:64-105](libraries/python/getpatter/tts/cartesia.py)
---
## Swapping Providers in Pipeline Mode
Because every STT adapter satisfies `STTProvider` and every TTS adapter satisfies `TTSProvider`, swapping is a constructor-level change only:
```python
# Before: Deepgram STT + ElevenLabs TTS
from getpatter.stt import deepgram
from getpatter.tts import elevenlabs
agent = Agent(
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
llm=...,
)
# After: Cartesia STT + Cartesia TTS (single-vendor, single API key)
from getpatter.stt import cartesia as cartesia_stt
from getpatter.tts import cartesia as cartesia_tts
agent = Agent(
stt=cartesia_stt.STT(),
tts=cartesia_tts.TTS(),
llm=...,
)
```
The pipeline handler (`PipelineStreamHandler`) calls `connect()`, `send_audio()`, `receive_transcripts()`, and `close()` on whatever `STTProvider` instance is passed, and calls `synthesize()` / `close()` on the `TTSProvider` — no other code changes are required.
The `provider_key` class variable on each adapter is used for cost attribution and OTel metrics (`patter.stt.provider`, `patter.cost.stt_seconds`) so switching providers automatically updates dashboards without code changes beyond the constructor.
---
## Known Limitations
| Provider | Limitation |
|---|---|
| **AssemblyAI** | `language` kwarg is silently ignored; language behavior is controlled by `model`. Chunk coalescing is mandatory — raw 20 ms Twilio frames trigger server error 3007. |
| **Cartesia STT** | Only `pcm_s16le` encoding is accepted (no mulaw support). |
| **OpenAI Transcribe** | `whisper-1` is explicitly rejected; use `whisper.STT` instead. No interim results. |
| **Whisper** | Batch-only (no interim results); highest latency of all STT options. |
| **Speechmatics** | Requires optional `pip install 'getpatter[speechmatics]'`; vendor SDK dependency not shared with other adapters. |
| **ElevenLabs WS** | `eleven_v3` model is not supported on the WebSocket endpoint — use `ElevenLabsRestTTS` for v3. |
| **ElevenLabs WS** | `optimize_streaming_latency` is deprecated by ElevenLabs and not exposed. |
---
## Summary
Patter ships eight STT adapters and six TTS adapters behind a unified `STTProvider` / `TTSProvider` interface. Streaming WebSocket adapters (Deepgram, AssemblyAI, Cartesia, Soniox, Speechmatics) support real-time interim transcripts and explicit finalization hooks for low-latency barge-in. HTTP-buffered adapters (Whisper, OpenAI Transcribe) trade streaming for simplicity. All adapters implement `warmup()` for pre-call latency optimization, and telephony-specific `for_twilio()` / `for_telnyx()` shortcuts are available on the most common adapters to avoid client-side audio resampling. Because every adapter is a drop-in replacement at the constructor level, BYOK/BYOC provider selection requires no pipeline code changes beyond the instantiation line.
---
## 05. LLM Providers, Tool Calling & Guardrails
> The LLM backend adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), the tool decorator and Tool dataclass for in-process and webhook-dispatch tools, output guardrail rules, dynamic variable substitution, and the PatterTool integration for LangChain / OpenAI Assistants orchestrators.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/05-llm-providers-tool-calling-guardrails.md
- Generated: 2026-05-27T19:17:53.033Z
### Source Files
- `libraries/python/getpatter/llm/openai.py`
- `libraries/python/getpatter/llm/anthropic.py`
- `libraries/python/getpatter/llm/groq.py`
- `libraries/python/getpatter/tools/`
- `libraries/python/getpatter/integrations/patter_tool.py`
- `libraries/typescript/src/llm/`
- `libraries/typescript/src/llm-loop.ts`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [libraries/python/getpatter/llm/openai.py](libraries/python/getpatter/llm/openai.py)
- [libraries/python/getpatter/llm/anthropic.py](libraries/python/getpatter/llm/anthropic.py)
- [libraries/python/getpatter/llm/groq.py](libraries/python/getpatter/llm/groq.py)
- [libraries/typescript/src/llm/openai.ts](libraries/typescript/src/llm/openai.ts)
- [libraries/typescript/src/llm/anthropic.ts](libraries/typescript/src/llm/anthropic.ts)
- [libraries/typescript/src/llm/cerebras.ts](libraries/typescript/src/llm/cerebras.ts)
- [libraries/typescript/src/llm/google.ts](libraries/typescript/src/llm/google.ts)
- [libraries/typescript/src/llm-loop.ts](libraries/typescript/src/llm-loop.ts)
- [libraries/python/getpatter/tools/tool_decorator.py](libraries/python/getpatter/tools/tool_decorator.py)
- [libraries/python/getpatter/tools/tool_executor.py](libraries/python/getpatter/tools/tool_executor.py)
- [libraries/python/getpatter/tools/circuit_breaker.py](libraries/python/getpatter/tools/circuit_breaker.py)
- [libraries/python/getpatter/_public_api.py](libraries/python/getpatter/_public_api.py)
- [libraries/python/getpatter/models.py](libraries/python/getpatter/models.py)
- [libraries/python/getpatter/telephony/common.py](libraries/python/getpatter/telephony/common.py)
- [libraries/python/getpatter/stream_handler.py](libraries/python/getpatter/stream_handler.py)
- [libraries/python/getpatter/integrations/patter_tool.py](libraries/python/getpatter/integrations/patter_tool.py)
- [libraries/typescript/src/pipeline-hooks.ts](libraries/typescript/src/pipeline-hooks.ts)
- [docs/python-sdk/guardrails.mdx](docs/python-sdk/guardrails.mdx)
</details>
# LLM Providers, Tool Calling & Guardrails
This page describes the full backend intelligence layer of the Patter SDK: the pluggable LLM adapter system (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), how tool calling is declared and executed (in-process handlers and webhook dispatch), the pipeline hook and guardrail system that intercepts and filters LLM output, dynamic variable substitution in system prompts, and the `PatterTool` integration that exposes a live Patter phone agent as a callable tool for external orchestrators such as LangChain, OpenAI Assistants, and Hermes Agent.
All five concepts are tightly coupled in the request loop: the `LLMLoop` (TypeScript) / `LLMProvider` + `ToolExecutor` (Python) drives the per-turn conversation, consults the tool registry when the model emits a tool call, runs pipeline hooks at each stage, evaluates guardrails against the final text, and substitutes per-call variables into the system prompt before the first send.
---
## LLM Provider System
### Architecture
Patter exposes a thin `LLMProvider` interface implemented by five concrete adapters. Any adapter can be passed wherever an LLM is accepted, making the system fully BYOK (bring-your-own-key).
```text
┌─────────────────────────────────────────────────────┐
│ LLMProvider interface │
│ stream(messages, tools?, opts?) → AsyncIterable │
│ warmup?() → Promise<void> │
└────────┬───────────────────────────────────────────-┘
│
┌────────┴──────────────────────────────────┐
│ Concrete providers (Python / TypeScript) │
│ │
│ OpenAILLMProvider (openai.com) │
│ AnthropicLLMProvider (anthropic.com) │
│ GroqLLMProvider (groq.com) │
│ CerebrasLLMProvider (cerebras.ai) │
│ GoogleLLMProvider (googleapis.com) │
└────────────────────────────────────────────┘
```
The TypeScript `LLMProvider` interface lives in `llm-loop.ts` and requires a single `stream()` generator method plus an optional `warmup()` pre-call hook.
Sources: [libraries/typescript/src/llm-loop.ts:243-273]()
### Provider Reference Table
| Provider | Module (Python) | Module (TypeScript) | Default model | API style | Env var |
|---|---|---|---|---|---|
| **OpenAI** | `getpatter.llm.openai` | `getpatter/llm/openai` | `gpt-4o-mini` | Chat Completions SSE | `OPENAI_API_KEY` |
| **Anthropic** | `getpatter.llm.anthropic` | `getpatter/llm/anthropic` | `claude-haiku-4-5-20251001` | Messages API SSE | `ANTHROPIC_API_KEY` |
| **Groq** | `getpatter.llm.groq` | `getpatter/llm/groq` | `llama-3.3-70b-versatile` | OpenAI-compatible | `GROQ_API_KEY` |
| **Cerebras** | *(via providers layer)* | `getpatter/llm/cerebras` | `gpt-oss-120b` | OpenAI-compatible | `CEREBRAS_API_KEY` |
| **Google Gemini** | *(via providers layer)* | `getpatter/llm/google` | `gemini-2.5-flash` | Generative Language SSE | `GEMINI_API_KEY` / `GOOGLE_API_KEY` |
### Constructing a Provider
Each provider follows the same pattern: the public `LLM` class in `getpatter/llm/<provider>` is a thin subclass of the underlying provider implementation. The API key is read from a constructor argument first, then from the environment variable, and a `ValueError` is raised if neither is present.
**Python:**
```python
from getpatter.llm import openai, anthropic, groq
llm = openai.LLM() # reads OPENAI_API_KEY
llm = openai.LLM(api_key="sk-...", model="gpt-4o")
llm = anthropic.LLM(prompt_caching=False) # opt out of cache
llm = groq.LLM(api_key="gsk_...", model="llama-3.3-70b-versatile")
```
**TypeScript:**
```ts
import * as openai from "getpatter/llm/openai";
import * as cerebras from "getpatter/llm/cerebras";
import * as google from "getpatter/llm/google";
const llm = new openai.LLM({ apiKey: "sk-...", temperature: 0.4 });
const llm = new cerebras.LLM({ gzipCompression: true }); // gzip for large prompts
const llm = new google.LLM({ model: "gemini-2.5-flash" });
```
Sources: [libraries/python/getpatter/llm/anthropic.py:28-63](), [libraries/typescript/src/llm/cerebras.ts:1-58]()
### Anthropic Prompt Caching
The Anthropic adapter enables prompt caching by default (`prompt_caching=True` / `promptCaching: true`). For voice agents with long system prompts, this saves ~100–400 ms TTFT and ~90% input-token cost on cached turns. Caching has no effect below Anthropic's minimum cacheable token threshold (~1024 tokens for Sonnet/Opus, ~2048 for Haiku).
### Streaming Protocol: `LLMChunk`
All providers stream chunks in a common shape. The `LLMLoop` accumulates these chunks across one or more iterations until a final text response is assembled or all tool calls are resolved.
```ts
interface LLMChunk {
type: 'text' | 'tool_call' | 'done' | 'usage';
// text
content?: string;
// tool_call
index?: number; // multiple tool calls share a stream; index groups them
id?: string;
name?: string;
arguments?: string; // JSON fragment; accumulated across chunks with same index
// usage
inputTokens?: number;
outputTokens?: number;
cacheReadInputTokens?: number;
cacheCreationInputTokens?: number;
}
```
Sources: [libraries/typescript/src/llm-loop.ts:218-242]()
### Pre-call Warmup
Each provider implements an optional `warmup()` method that issues a lightweight HTTPS GET to its inference endpoint (e.g., `GET /models`). The SDK calls this once per outbound dial when `prewarm: true` is set (the default). Failures are swallowed at debug level — warmup is a latency optimisation, never a correctness gate.
---
## LLM Loop
`LLMLoop` (TypeScript) is the core turn engine. On each user utterance it builds the message list, calls `provider.stream()`, accumulates streaming chunks, dispatches any tool calls, and yields text tokens to the TTS layer. The loop runs up to **10 iterations** to handle multi-step tool chains.
```mermaid
sequenceDiagram
participant SH as StreamHandler
participant Loop as LLMLoop
participant Provider as LLMProvider
participant Exec as ToolExecutor
SH->>Loop: run(userText, history, callCtx)
loop ≤10 iterations
Loop->>Provider: stream(messages, openaiTools)
Provider-->>Loop: LLMChunk* (text | tool_call | usage)
alt tool_call chunks
Loop->>Exec: execute(toolDef, args, callCtx)
Exec-->>Loop: JSON string result
Note over Loop: append assistant+tool messages
else no tool calls
Loop-->>SH: yield text tokens
end
end
```
**Token billing fallback.** When a provider omits a `usage` chunk (observed on some Cerebras streaming variants), the loop falls back to a `chars/4` token estimate and logs a warning so operators notice the approximation.
Sources: [libraries/typescript/src/llm-loop.ts:452-530](), [libraries/typescript/src/llm-loop.ts:560-590]()
---
## Tool Calling
### The `@tool` Decorator
The Python `@tool` decorator (in `getpatter/tools/tool_decorator.py`) inspects a typed function's signature and Google-style docstring to build a complete `ToolDefinition` dict. No manual JSON Schema authoring is required.
```python
from getpatter import tool
@tool
async def get_weather(location: str, unit: str = "celsius") -> str:
"""Get the current weather for a location.
Args:
location: City name or zip code
unit: Temperature unit (celsius or fahrenheit)
"""
return f"Sunny, 22°{unit[0].upper()}"
# get_weather is now: {"name": "get_weather", "description": "...", "parameters": {...}, "handler": <adapter>}
```
**Type mapping:**
| Python type | JSON Schema type |
|---|---|
| `str` | `string` |
| `int` | `integer` |
| `float` | `number` |
| `bool` | `boolean` |
| `list` | `array` |
| `dict` / anything else | `object` |
| `Optional[X]` / `X \| None` | base type, not in `required` |
The decorator wraps the user function in an `_adapter` coroutine that bridges the runtime call signature `handler(arguments: dict, call_context: dict)` and the user-written signature `fn(location, unit)`. This prevents the common `takes 1 positional argument but 2 were given` error.
Sources: [libraries/python/getpatter/tools/tool_decorator.py:105-178]()
### The `Tool` Dataclass
The public API also exposes a `Tool` frozen dataclass (declared in `_public_api.py`) for explicit construction without the decorator. Exactly one of `handler` or `webhook_url` must be provided.
```python
from getpatter import Tool, tool
# Decorator form
@tool
async def lookup_order(order_id: str) -> str:
"""Fetch order status."""
...
# Keyword constructor — webhook dispatch
transfer = Tool(
name="transfer_call",
description="Transfer to a live agent.",
parameters={"type": "object", "properties": {"department": {"type": "string"}}, "required": ["department"]},
webhook_url="https://api.example.com/transfer",
)
```
`Tool` also accepts a `strict: bool` flag that enables OpenAI strict-mode schema enforcement, and a `reassurance: str | dict` field for a spoken filler message during slow tool calls (currently honoured in Realtime mode only).
Sources: [libraries/python/getpatter/_public_api.py:30-75]()
### Tool Executor: Retry, Backoff & Circuit Breaker
Both the Python `ToolExecutor` and the TypeScript `DefaultToolExecutor` share the same resilience strategy:
1. **Retry with exponential backoff** — default 2 retries (3 total attempts), 500 ms base delay, capped at 5 s, with ±60 ms jitter.
2. **Per-tool circuit breaker** — `CLOSED → OPEN` after `failureThreshold` consecutive failures (default 5), stays `OPEN` for `cooldownMs` (default 30 000 ms), then probes once (`HALF_OPEN`).
3. **Structured fallback JSON** — all failure paths return `{"error": "...", "fallback": true}` so the LLM can acknowledge the failure gracefully instead of hanging.
```mermaid
stateDiagram-v2
[*] --> CLOSED
CLOSED --> OPEN : ≥5 consecutive failures
OPEN --> HALF_OPEN : cooldown (30 s) elapsed
HALF_OPEN --> CLOSED : probe succeeds
HALF_OPEN --> OPEN : probe fails
```
**Webhook SSRF protection (Python):** `_validate_webhook_url()` blocks non-HTTP(S) schemes, loopback/private IP addresses (including literal IPs in the URL), and a hardcoded blocklist of dangerous hostnames such as `localhost`, `metadata.google.internal`, and `ip6-loopback`.
Sources: [libraries/python/getpatter/tools/tool_executor.py:93-140](), [libraries/python/getpatter/tools/circuit_breaker.py:1-30]()
**Response size guard:** Webhook responses larger than **1 MB** are rejected immediately and the circuit records a failure. This prevents oversized tool results from exhausting the LLM's context window.
---
## Pipeline Hooks
Pipeline hooks intercept data at every stage of the STT → LLM → TTS pipeline. They are the primary extensibility point for RAG injection, content moderation, cost control, and custom logging. All hooks are **fail-open**: an exception logs an error and passes the original value through unchanged.
| Hook | Stage | Can veto? | Return `null` means |
|---|---|---|---|
| `beforeSendToStt` | Before STT | Yes | Drop audio chunk |
| `afterTranscribe` | After STT | Yes | Skip LLM turn |
| `beforeLlm` | Before LLM | No (null = keep) | Keep original messages |
| `afterLlm.onChunk` | Per token (tier 1) | No | Keep original chunk |
| `afterLlm.onSentence` | Per sentence (tier 2) | Yes (empty = drop) | Keep original sentence |
| `afterLlm.onResponse` | Full response (tier 3) | No | Keep original text |
| `beforeSynthesize` | Before TTS | Yes | Skip TTS for sentence |
| `afterSynthesize` | After TTS | Yes | Drop audio chunk |
The three-tier `afterLlm` design avoids unnecessary buffering: tier 1 (`onChunk`) and tier 2 (`onSentence`) keep streaming; only tier 3 (`onResponse`) requires buffering the full response before yielding to TTS.
Sources: [libraries/typescript/src/pipeline-hooks.ts:48-175]()
---
## Output Guardrails
Guardrails intercept the final LLM text *before* it reaches text-to-speech. They are evaluated in declaration order; the first match short-circuits evaluation and substitutes the `replacement` string.
### `Guardrail` Dataclass
```python
from getpatter import Guardrail, guardrail
# frozen dataclass form
g = Guardrail(
name="No medical advice",
blocked_terms=["diagnosis", "prescription", "dosage"],
replacement="That's a medical question I can't answer.",
)
# factory form (identical outcome)
g = guardrail(
name="No phone numbers",
check=lambda text: bool(__import__("re").search(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b", text)),
replacement="I can't share numbers directly.",
)
```
| Field | Type | Default | Behaviour |
|---|---|---|---|
| `name` | `str` | *required* | Logged when fired |
| `blocked_terms` | `list[str] \| None` | `None` | Case-insensitive substring scan |
| `check` | `Callable[[str], bool] \| None` | `None` | Called after `blocked_terms`; `True` = block |
| `replacement` | `str` | `"I'm sorry, I can't respond to that."` | Spoken instead of blocked response |
**Evaluation order in `stream_handler.py`:**
1. Check each `blocked_term` with `term.lower() in response_text.lower()`.
2. If no term matched, call `check(response_text)`.
3. On first match, log `WARNING "Guardrail '<name>' triggered on: <snippet>"` and return the replacement.
Sources: [libraries/python/getpatter/stream_handler.py:370-410](), [libraries/python/getpatter/models.py:31-50]()
Multiple guardrails are evaluated in list order; the first to trigger wins:
```python
agent = phone.agent(
system_prompt="You are a financial advisor assistant.",
guardrails=[
guardrail(name="No stock tips", blocked_terms=["buy", "sell", "invest in"],
replacement="Please consult a licensed advisor."),
guardrail(name="No PII", check=lambda t: "SSN" in t.upper(),
replacement="I cannot share personal identification information."),
],
)
```
---
## Dynamic Variable Substitution
System prompts support `{key}` placeholder substitution resolved per call via `_resolve_variables()` in `libraries/python/getpatter/telephony/common.py`. Values are sanitised (control characters stripped, capped at 500 chars) to prevent prompt injection before substitution.
```python
agent = phone.agent(
system_prompt="You are calling {customer_name} about order #{order_id}.",
variables={"customer_name": "Alice", "order_id": "A-1234"},
)
```
At runtime each `{key}` is replaced with `str(value)` via a simple `str.replace` loop. Unresolved placeholders are left as-is.
Sources: [libraries/python/getpatter/telephony/common.py:17-31]()
---
## `PatterTool` Integration
`PatterTool` (in `getpatter/integrations/patter_tool.py`) wraps a running `Patter` phone instance as a single callable tool consumable by external LLM orchestrators. The wire schema is shared across OpenAI, Anthropic, and Hermes Agent, so the same tool can be registered in any framework without changes.
### Exported Schemas
```python
tool = PatterTool(phone=phone, agent={"stt": stt, "llm": llm, "tts": tts})
tool.openai_schema() # → {"type": "function", "function": {"name": "make_phone_call", ...}}
tool.anthropic_schema() # → {"name": "make_phone_call", "input_schema": {...}}
tool.hermes_schema() # → {"name": "make_phone_call", "parameters": {...}}
```
### Wire Parameters
The fixed JSON Schema accepted by all three formats:
| Parameter | Type | Required | Description |
|---|---|---|---|
| `to` | `string` (E.164) | Yes | Destination phone number |
| `goal` | `string` | No | Becomes the in-call system prompt |
| `first_message` | `string` | No | Spoken when callee answers |
| `max_duration_sec` | `integer` [5–1800] | No | Hard call timeout; default 180 s |
### Execution Flow
```mermaid
sequenceDiagram
participant Orchestrator as LLM Orchestrator
participant PT as PatterTool
participant Phone as Patter (phone)
participant SSE as MetricsStore SSE
Orchestrator->>PT: execute(to, goal, ...)
PT->>Phone: call(to, overrideAgent)
SSE-->>PT: call_initiated {call_id}
Note over PT: Future<call_id> resolved
Phone-->>PT: on_call_end(data)
PT-->>Orchestrator: PatterToolResult {call_id, status, transcript, duration, cost}
```
`execute()` holds a dial lock so concurrent calls are serialised and each one captures exactly its own `call_initiated` SSE event. A configurable timeout (default 180 s) raises `TimeoutError` if the call does not complete.
Sources: [libraries/python/getpatter/integrations/patter_tool.py:100-185]()
### Hermes Agent Registration
```python
from tools.registry import registry
from getpatter.integrations import PatterTool
tool = PatterTool(phone=phone, agent={...})
tool.register_hermes(registry, toolset="patter")
# Hermes handler returns JSON string: result envelope or {"error": "..."}
```
`register_hermes` bridges Hermes' `handler(args: dict, **kw) -> Awaitable[str]` contract to `PatterTool.execute()`.
---
## Summary
Patter's LLM backend is built around a pluggable `LLMProvider` interface with five production-ready adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), all sharing the same streaming `LLMChunk` protocol. Tool calling is declared via the `@tool` decorator or the `Tool` dataclass and executed through a resilient `ToolExecutor` with exponential backoff, circuit breakers, SSRF protection, and structured fallback JSON. Pipeline hooks give fine-grained interception at every STT/LLM/TTS stage with fail-open semantics. Output guardrails apply keyword and callable checks against the final LLM text before synthesis. Dynamic `{variable}` substitution in system prompts is sanitized at the boundary to prevent prompt injection. Finally, `PatterTool` packages the entire phone-agent runtime as a schema-stable tool callable from OpenAI Assistants, Anthropic tool-use, or Hermes Agent with no SDK lock-in.
---
## 06. Dashboard, Observability, Tunneling & Deployment
> The built-in real-time monitoring dashboard, vendor-neutral OpenTelemetry tracing, call metrics and cost tracking, Cloudflare quick-tunnel vs. static webhook URL trade-offs, test mode (no phone required), Docker Compose setup, and the agent skills bundle for AI coding assistants — the complete operational surface for running Patter in development and production.
- Page Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/pages/06-dashboard-observability-tunneling-deployment.md
- Generated: 2026-05-27T19:16:04.957Z
### Source Files
- `libraries/python/getpatter/observability/`
- `libraries/python/getpatter/tunnel.py`
- `libraries/python/getpatter/tunnels/`
- `libraries/python/getpatter/test_mode.py`
- `dashboard-app/src/App.tsx`
- `Dockerfile`
- `docker-compose.yml`
<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:
- [libraries/python/getpatter/observability/__init__.py](libraries/python/getpatter/observability/__init__.py)
- [libraries/python/getpatter/observability/tracing.py](libraries/python/getpatter/observability/tracing.py)
- [libraries/python/getpatter/observability/metric_types.py](libraries/python/getpatter/observability/metric_types.py)
- [libraries/python/getpatter/observability/event_bus.py](libraries/python/getpatter/observability/event_bus.py)
- [libraries/python/getpatter/observability/attributes.py](libraries/python/getpatter/observability/attributes.py)
- [libraries/python/getpatter/tunnel.py](libraries/python/getpatter/tunnel.py)
- [libraries/python/getpatter/tunnels/__init__.py](libraries/python/getpatter/tunnels/__init__.py)
- [libraries/python/getpatter/test_mode.py](libraries/python/getpatter/test_mode.py)
- [libraries/python/getpatter/dashboard/routes.py](libraries/python/getpatter/dashboard/routes.py)
- [libraries/python/getpatter/dashboard/store.py](libraries/python/getpatter/dashboard/store.py)
- [libraries/python/getpatter/dashboard/auth.py](libraries/python/getpatter/dashboard/auth.py)
- [libraries/python/getpatter/dashboard/export.py](libraries/python/getpatter/dashboard/export.py)
- [libraries/python/getpatter/dashboard/persistence.py](libraries/python/getpatter/dashboard/persistence.py)
- [libraries/python/getpatter/dashboard/ui.py](libraries/python/getpatter/dashboard/ui.py)
- [libraries/python/getpatter/pricing.py](libraries/python/getpatter/pricing.py)
- [libraries/python/getpatter/server.py](libraries/python/getpatter/server.py)
- [dashboard-app/src/App.tsx](dashboard-app/src/App.tsx)
- [dashboard-app/src/hooks/useDashboardData.ts](dashboard-app/src/hooks/useDashboardData.ts)
- [Dockerfile](Dockerfile)
- [docker-compose.yml](docker-compose.yml)
- [skills-lock.json](skills-lock.json)
</details>
# Dashboard, Observability, Tunneling & Deployment
Patter's operational surface spans four concerns that work together during both development and production: a real-time web dashboard backed by a Server-Sent Events (SSE) feed, a layered observability stack (in-process event bus, typed metrics, and optional OpenTelemetry tracing), automatic HTTPS tunneling for webhook delivery, and a terminal test mode that lets developers iterate without a phone or SIM card. Together they form a self-contained stack that can run on a developer laptop via Docker Compose or be promoted to a hosted server without changing application code.
This page documents each subsystem in depth — its architecture, configuration surface, data flow, and operational trade-offs — covering both the Python SDK implementation and the React dashboard SPA.
---
## Real-Time Monitoring Dashboard
### Overview and Architecture
The dashboard is a Vite + React SPA compiled into a single self-contained HTML file (`ui.html`) by `vite-plugin-singlefile`. At runtime the file is loaded from the Python package via `importlib.resources` and served as the root route of the embedded FastAPI server.
```text
┌─────────────────────────────────────────────────────┐
│ Embedded FastAPI server (server.py) │
│ │
│ GET / → ui.html (SPA) │
│ GET /api/dashboard/calls → call list JSON │
│ GET /api/dashboard/calls/:id → single call JSON │
│ GET /api/dashboard/active → live calls JSON │
│ GET /api/dashboard/aggregates → aggregate stats │
│ GET /api/dashboard/events → SSE stream │
│ GET /api/dashboard/export/calls → CSV / JSON │
│ DELETE /api/dashboard/calls/:id → soft delete │
│ POST /api/dashboard/calls/delete → batch delete │
└─────────────────────────────────────────────────────┘
```
The SPA asset is shipped inside the wheel via `[tool.setuptools.package-data]` in `pyproject.toml` and regenerated by running `npm run build && npm run sync` inside `dashboard-app/`.
Sources: [libraries/python/getpatter/dashboard/ui.py:1-35](), [libraries/python/getpatter/dashboard/routes.py:1-30]()
### MetricsStore — In-Memory Call State
`MetricsStore` is the single source of truth for the dashboard's data layer. It is thread-safe, uses a `threading.Lock` for all mutations, and publishes events via asyncio queues so SSE subscribers never block the call path.
| Method | Description |
|---|---|
| `record_call_initiated(data)` | Pre-registers an outbound call before any media arrives |
| `record_call_start(data)` | Marks media stream start; upgrades `initiated` → `in-progress` |
| `record_turn(data)` | Appends a completed turn; publishes `turn_complete` |
| `record_call_end(data, metrics)` | Moves the call to history with final `CallMetrics`; publishes `call_end` |
| `update_call_status(call_id, status)` | Handles Twilio-style status callbacks; moves terminal statuses to history |
| `get_aggregates()` | Computes total calls, total cost, avg duration, avg p95 latency |
| `hydrate(log_root)` | Replays `metadata.json` and `transcript.jsonl` files from disk on restart |
| `delete_calls(call_ids)` | Soft-deletes calls: hidden from UI, on-disk files untouched |
The store keeps at most `max_calls` (default 500) completed calls in memory. Calls are stored oldest-first; all read paths (`get_calls`, `get_aggregates`, `get_calls_in_range`) return newest-first and filter soft-deleted IDs.
**Persistence across restarts.** At startup, `hydrate(log_root)` walks `<log_root>/calls/YYYY/MM/DD/<call_id>/metadata.json` files written by `CallLogger` and reconstructs the in-memory call list. Soft-deleted IDs are persisted to `<log_root>/.deleted_call_ids.json` atomically via `os.replace` so deletions survive restarts. A corrupt file per entry is skipped individually with a `DEBUG` log rather than aborting hydration.
Sources: [libraries/python/getpatter/dashboard/store.py:50-120](), [libraries/python/getpatter/dashboard/store.py:380-430]()
### Server-Sent Events (SSE) Feed
The dashboard SPA opens a persistent `EventSource('/api/dashboard/events')` connection. The server keeps an `asyncio.Queue(maxsize=100)` per subscriber and broadcasts events from `MetricsStore._publish`.
**Event types published over SSE:**
| Event | Trigger |
|---|---|
| `call_initiated` | `record_call_initiated` — outbound dial pre-registered |
| `call_start` | `record_call_start` — media stream begins |
| `call_status` | `update_call_status` — Twilio status callback |
| `turn_complete` | `record_turn` — one conversation turn logged |
| `call_end` | `record_call_end` — call finalised with metrics |
| `calls_deleted` | `delete_calls` — one or more calls soft-deleted |
The server sends a `: keepalive\n\n` comment every 30 seconds to keep proxies from timing out the connection. If a subscriber's queue is full (100 events), the subscriber is dropped silently.
The SPA (`useDashboardData.ts`) reconnects with exponential backoff (1 s → 30 s cap, max 5 attempts) before falling back to polling every 5 seconds.
Sources: [libraries/python/getpatter/dashboard/routes.py:105-140](), [dashboard-app/src/hooks/useDashboardData.ts:14-55]()
### SPA Metrics and UI
The React SPA (`App.tsx`) renders four headline metric tiles (total calls in range, avg p95 latency, spend, active now), a call table with search, and a right-side detail panel showing live transcript and per-call metrics.
Key computed values:
- **Range filtering** — calls are bucketed into 1 h / 24 h / 7 d / All-time windows. Live calls (`status === 'live'`) are always shown regardless of range.
- **Sparklines** — computed by `computeSparkline` in `lib/mappers.ts` aligned to natural time boundaries (e.g. full hours).
- **SDK version pill** — sourced from `/api/dashboard/aggregates`'s `sdk_version` field, which reflects the installed `getpatter.__version__` at runtime.
- **Phone number masking** — the topbar reveals or hides phone numbers via a toggle stored in `useUiPrefs`.
Sources: [dashboard-app/src/App.tsx:1-120]()
### Dashboard Authentication
When `token` is non-empty, all dashboard routes require a valid bearer token via constant-time `hmac.compare_digest`. Two delivery mechanisms are supported so browser navigation still works:
- `Authorization: Bearer <token>` header
- `?token=<token>` query parameter
An empty token disables authentication entirely (suitable for local development).
Sources: [libraries/python/getpatter/dashboard/auth.py:1-35]()
### Data Export
`GET /api/dashboard/export/calls` supports `?format=csv` and `?format=json` with optional `?from=<ISO8601>&to=<ISO8601>` date filtering. Soft-deleted calls are excluded from all exports.
CSV columns: `call_id`, `caller`, `callee`, `direction`, `started_at`, `ended_at`, `duration_s`, `cost_total`, `cost_stt`, `cost_tts`, `cost_llm`, `cost_telephony`, `avg_latency_ms`, `turns_count`, `provider_mode`.
Sources: [libraries/python/getpatter/dashboard/export.py:1-55](), [libraries/python/getpatter/dashboard/routes.py:150-190]()
---
## Observability
Patter's observability stack has three independent layers that can be used in any combination.
```text
┌───────────────────────────────────────────────────────────────────┐
│ Layer 1: In-process EventBus │
│ Synchronous + async handlers; fire-and-forget; never blocks call │
├───────────────────────────────────────────────────────────────────┤
│ Layer 2: Typed Metric Dataclasses │
│ Frozen, provider-neutral; fed into MetricsStore + exporters │
├───────────────────────────────────────────────────────────────────┤
│ Layer 3: OpenTelemetry Tracing (opt-in, PATTER_OTEL_ENABLED=1) │
│ Standard OTLP export; no PII in span attributes │
└───────────────────────────────────────────────────────────────────┘
```
### Layer 1 — In-Process EventBus
`EventBus` is a lightweight pub-sub emitter for pipeline-internal events. Handlers are fire-and-forget: exceptions are caught and logged so a misbehaving observer never disrupts the call.
```python
bus = EventBus()
unsub = bus.on("turn_ended", lambda payload: print(payload))
bus.emit("turn_ended", {"turn_index": 0})
unsub() # remove listener
```
Async callbacks are scheduled via `asyncio.create_task` (requires a running event loop). Sync callbacks are called inline.
**Supported event types** (`PatterEventType`):
| Event | When emitted |
|---|---|
| `turn_started` / `turn_ended` | Around each user/agent turn |
| `eou_metrics` | End-of-utterance timing captured |
| `interruption` | Barge-in detected |
| `llm_metrics` / `stt_metrics` / `tts_metrics` | Provider-stage metrics |
| `metrics_collected` | Full turn metrics aggregated |
| `call_ended` | Call teardown |
| `transcript_partial` / `transcript_final` | STT transcript events |
| `llm_chunk` / `tts_chunk` | Streaming token / audio chunk |
| `tool_call_started` | Tool invocation begins |
Sources: [libraries/python/getpatter/observability/event_bus.py:1-70]()
### Layer 2 — Typed Metric Dataclasses
All metric payloads are frozen dataclasses (no Pydantic dependency) defined in `metric_types.py`. They form the canonical observability surface consumed by the dashboard, EventBus handlers, and exporters.
| Type | Fields |
|---|---|
| `EOUMetrics` | `end_of_utterance_delay`, `transcription_delay`, `on_user_turn_completed_delay` (all ms) |
| `InterruptionMetrics` | `total_duration`, `detection_delay`, `num_interruptions`, `num_backchannels` (seconds) |
| `TTFBMetrics` | `processor`, `value`, `model` |
| `ProcessingMetrics` | `processor`, `value`, `model` |
| `LLMUsage` | `prompt_tokens`, `completion_tokens`, `total_tokens`, cached/creation/read tokens |
| `RealtimeUsage` | `session_duration_seconds`, `tokens_per_second`, `InputTokenDetails`, `OutputTokenDetails` |
All timestamps default to `time.time()` at dataclass construction, keeping the pipeline code free of explicit clock calls.
Sources: [libraries/python/getpatter/observability/metric_types.py:1-115]()
### Layer 3 — OpenTelemetry Tracing (Opt-In)
OTel tracing is disabled by default. It activates only when `PATTER_OTEL_ENABLED=1` is set **and** the `opentelemetry-sdk` package is installed. No telemetry is emitted without explicit opt-in.
**Enabling tracing:**
```bash
pip install "getpatter[tracing]"
export PATTER_OTEL_ENABLED=1
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
```
```python
from getpatter.observability import init_tracing
init_tracing(service_name="my-agent", resource_attributes={"deployment.env": "prod"})
```
**Span hierarchy for a single call:**
```text
getpatter.call ← top-level call span (stream handler)
getpatter.endpoint ← silence-detected → LLM-dispatch window
getpatter.stt ← one STT inference
getpatter.llm ← one LLM completion
getpatter.tool ← one tool invocation
getpatter.tts ← one TTS synthesis
getpatter.bargein ← interrupt-detected → TTS-stopped window
```
**Privacy guarantee:** Only sizes and provider identifiers are recorded as span attributes. User utterances and tool payloads are never included.
**`start_span` context manager** is a zero-overhead no-op when tracing is disabled — callers do not need to guard against a disabled state:
```python
with start_span("getpatter.llm", attributes={"llm.provider": "openai"}) as span:
... # span is None when tracing is off
```
**`patter_call_scope`** binds a `call_id` and an optional `side` label to the current asyncio task tree using `ContextVar`, so spans from deeply nested provider code automatically inherit the call identity.
**`attach_span_exporter`** wires a custom `SpanExporter` into the global `TracerProvider` via `SimpleSpanProcessor`. Idempotent on the same exporter object reference.
Sources: [libraries/python/getpatter/observability/tracing.py:1-160](), [libraries/python/getpatter/observability/attributes.py:1-120]()
### Cost Tracking
`pricing.py` maintains a versioned `DEFAULT_PRICING` table (version `2026.3`, last updated 2026-05-08). Each provider entry carries provider-level defaults plus an optional `models` dict for model-specific overrides. Lookup uses longest-prefix matching so versioned model IDs like `claude-haiku-4-5-20251001` resolve correctly against `claude-haiku-4-5`.
Cost fields flow through `CallMetrics.cost` into `MetricsStore` and are surfaced in:
- The dashboard's per-call detail panel and aggregate `total_cost`
- The CSV export columns `cost_total`, `cost_stt`, `cost_tts`, `cost_llm`, `cost_telephony`
- OTel `patter.cost.*` span attributes via `record_patter_attrs`
Operator overrides:
```python
Patter(pricing={
"elevenlabs": {"models": {"my_custom_model": {"price": 0.075}}}
})
```
Sources: [libraries/python/getpatter/pricing.py:1-50]()
---
## Tunneling
Patter needs a publicly reachable HTTPS URL to receive webhook callbacks from telephony carriers. Three tunnel strategies are available via the `getpatter.tunnels` module.
### Strategy Comparison
| Strategy | Class | Public URL | Account required | Stable URL | Process managed |
|---|---|---|---|---|---|
| Cloudflare Quick Tunnel | `CloudflareTunnel` | `*.trycloudflare.com` | No | No (random each run) | Yes (`cloudflared` subprocess) |
| Static / bring-your-own | `Static` | Any hostname | Depends | Yes | No |
| Ngrok (directive only) | `Ngrok` | `*.ngrok.io` | Yes | Yes (with reserved) | No (Phase 1a marker) |
### CloudflareTunnel — Automatic Setup
When `tunnel=True` or `tunnel=CloudflareTunnel()` is passed to the server, `start_tunnel(port)` spawns a `cloudflared tunnel --url http://localhost:<port>` subprocess, reads the assigned `*.trycloudflare.com` hostname from `cloudflared`'s stderr, and registers an `atexit` handler to stop the process on exit.
```python
from getpatter.tunnels import CloudflareTunnel
phone = Patter(mode="local", tunnel=CloudflareTunnel())
```
Requirements: `cloudflared` binary on `PATH`.
```bash
# macOS
brew install cloudflared
# Debian/Ubuntu
sudo apt install cloudflared
```
The URL regex used to extract the hostname is `https://([a-zA-Z0-9._-]+\.trycloudflare\.com)`, applied to both stdout and stderr streams concurrently. A `TimeoutError` is raised if no URL appears within 30 seconds.
**Trade-off:** The URL is ephemeral — it changes every time the process restarts. For production or long-lived development sessions use a `Static` hostname.
Sources: [libraries/python/getpatter/tunnel.py:1-115]()
### Static — Stable Webhook URL
`Static(hostname="...")` tells the server to use a pre-existing public hostname. The SDK will not spawn any subprocess. The hostname must already route HTTPS traffic to the local port (via ngrok, Cloudflare Tunnel with a named route, a reverse proxy, or a direct public IP).
```python
from getpatter.tunnels import Static
phone = Patter(mode="local", tunnel=Static(hostname="agent.example.com"))
```
`Static` validates that `hostname` is non-empty at construction time.
Sources: [libraries/python/getpatter/tunnels/__init__.py:40-60]()
### Ngrok (Planned)
`Ngrok` is currently a marker dataclass — it records an optional reserved `hostname` but does not launch a subprocess. Programmatic `ngrok` integration via the `ngrok` Python package is noted as a future addition. Users who already run `ngrok` manually should use `Static` with the public hostname.
Sources: [libraries/python/getpatter/tunnels/__init__.py:18-38]()
---
## Test Mode — No Phone Required
`TestSession.run(agent)` starts an interactive terminal REPL that simulates a phone call without requiring telephony, STT, or TTS. This is the recommended first step for iterating on agent logic.
```python
phone = Patter(mode="local", phone_number="+15550001234")
agent = phone.agent(
system_prompt="You are helpful.",
stt=DeepgramSTT(api_key="..."),
tts=ElevenLabsTTS(api_key="..."),
)
await phone.test(agent)
```
### Session Behaviour
On entry the REPL:
1. Generates a synthetic `call_id` (`test_<12 hex chars>`), caller (`+15550000001`), and callee (`+15550000002`).
2. Fires `on_call_start` if registered, accepting the same override dict as a real call.
3. Prints the agent's `first_message` if set.
4. Enters a `readline`-based REPL in a thread-pool executor (so the async event loop remains unblocked).
**Built-in LLM fallback.** If no `on_message` handler is provided but `openai_key` is supplied, a `LLMLoop` is instantiated using `gpt-4o-mini` (Realtime models are silently swapped to chat-completions for test mode compatibility). System prompt variable substitution runs before the loop starts.
**REPL commands:**
| Command | Effect |
|---|---|
| `/quit` | End session immediately |
| `/hangup` | Simulate caller hangup |
| `/transfer <number>` | Simulate a transfer (agent-initiated) |
| `/history` | Print conversation history so far |
On exit the REPL fires `on_call_end` with the full transcript so end-of-call hooks are exercised.
Sources: [libraries/python/getpatter/test_mode.py:1-175]()
---
## Docker Compose Deployment
### Dockerfile
The provided `Dockerfile` uses `python:3.13-slim` as the base, installs `getpatter[local]` and `python-dotenv`, copies the working directory, exposes port 8000, and defaults to running `python/main.py`:
```dockerfile
FROM python:3.13-slim
WORKDIR /app
RUN pip install --no-cache-dir "getpatter[local]" python-dotenv
COPY . .
EXPOSE 8000
CMD ["python", "python/main.py"]
```
Override the entry point to run any agent script:
```bash
docker run patter python my_agent.py
```
Sources: [Dockerfile:1-14]()
### docker-compose.yml
The minimal Compose file builds the image from the local context, forwards port `8000`, loads environment from `.env`, and restarts unless manually stopped:
```yaml
services:
patter:
build: .
ports:
- "8000:8000"
env_file: .env
restart: unless-stopped
```
```bash
cp .env.example .env # fill in API keys
docker compose up --build
```
The dashboard will be accessible at `http://localhost:8000` once the agent starts. In this configuration the Cloudflare tunnel is optional — a `Static` hostname pointing to your public IP or a deployed ingress is preferred for production.
Sources: [docker-compose.yml:1-7]()
### Sequence: Development Startup
```mermaid
sequenceDiagram
participant Dev as Developer
participant Patter as EmbeddedServer
participant CF as cloudflared
participant Carrier as Telephony Carrier
participant Browser as Dashboard Browser
Dev->>Patter: python main.py (or docker compose up)
Patter->>CF: spawn cloudflared tunnel --url http://localhost:8000
CF-->>Patter: trycloudflare.com URL extracted from stderr
Patter->>Carrier: register webhook_url = https://<hostname>
Patter->>Browser: serve GET / → ui.html (SPA)
Browser->>Patter: EventSource /api/dashboard/events
Carrier->>Patter: POST /webhook (inbound call)
Patter->>Browser: SSE: call_start, turn_complete, call_end
```
---
## Agent Skills Bundle
The repository ships one locked skill for AI coding assistants:
| Skill | Source | Hash |
|---|---|---|
| `line-voice-agent` | `cartesia-ai/skills` (GitHub) | `554487...` |
The skill is declared in `skills-lock.json` and the resolved artefact lives under `.agents/skills/line-voice-agent/`. This provides AI coding assistants (such as Goose and pi) with context-aware guidance for building voice agents with the Cartesia Line SDK, tool calling, multi-agent handoffs, and real-time interruption handling — all within the Patter development environment.
Sources: [skills-lock.json:1-11]()
---
## Summary
Patter's operational layer is designed around three principles: **zero-cost defaults** (tracing is off unless `PATTER_OTEL_ENABLED=1` is set; the dashboard requires no external database), **bring-your-own infrastructure** (tunnel strategy, OTLP backend, pricing overrides, and metrics backends are all pluggable), and **rapid local iteration** (test mode collapses the telephony/STT/TTS stack to a terminal REPL with a one-liner). In production the same FastAPI server that handles webhooks also serves the React dashboard, exposes the SSE feed, and writes per-call JSONL logs — all within a single process that fits in a `python:3.13-slim` container.
---