# Technical Orientation

> What Patter is, its end-to-end call data flow, the Patter / agent / serve three-object model, SDK entry points in both Python and TypeScript, and how the rest of this reference is organized.

- Repository: PatterAI/Patter
- GitHub: https://github.com/PatterAI/Patter
- Human wiki: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc
- Complete Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/llms-full.txt

## Source Files

- `README.md`
- `libraries/python/getpatter/client.py`
- `libraries/python/getpatter/_public_api.py`
- `libraries/python/getpatter/models.py`
- `libraries/typescript/src/client.ts`
- `libraries/typescript/src/public-api.ts`
- `libraries/typescript/src/types.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [README.md](README.md)
- [libraries/python/getpatter/client.py](libraries/python/getpatter/client.py)
- [libraries/python/getpatter/_public_api.py](libraries/python/getpatter/_public_api.py)
- [libraries/python/getpatter/models.py](libraries/python/getpatter/models.py)
- [libraries/typescript/src/client.ts](libraries/typescript/src/client.ts)
- [libraries/typescript/src/public-api.ts](libraries/typescript/src/public-api.ts)
- [libraries/typescript/src/types.ts](libraries/typescript/src/types.ts)
</details>

# Technical Orientation

This page explains what Patter is, how audio flows through the system from a telephone call to an AI response and back, how the three top-level SDK objects relate to each other, and where to find each public entry point in both the Python and TypeScript SDKs. It is the starting point for the rest of this reference.

Patter is an open-source SDK (packages `getpatter` on both PyPI and npm) that connects AI agents to real phone numbers. You supply a telephony carrier (Twilio or Telnyx), any AI engine or pipeline, and Patter handles WebSocket media streaming, speech-to-text, text-to-speech, barge-in detection, tool dispatch, call recording, AMD (answering-machine detection), and an observability dashboard. The advertised 4-line quickstart is not marketing shorthand — the three-object constructor → agent → serve pattern is literally the entire surface. Both the Python and TypeScript SDKs expose identical semantics, field names, and lifecycle hooks so a cross-runtime team can run the same agent in both languages without rewriting business logic.

---

## The Patter / Agent / Serve Three-Object Model

Every Patter program builds exactly three objects, in order.

```text
┌─────────────────────────────────────────────┐
│  Patter(carrier=..., phone_number=...)       │  ← 1. client: carrier + tunnel
│    .agent(engine=..., system_prompt=...)     │  ← 2. agent: AI config (frozen)
│    .serve(agent, tunnel=True, ...)           │  ← 3. serve: embedded HTTP/WS server
└─────────────────────────────────────────────┘
```

### 1. `Patter` — the client

`Patter` is the SDK's root object. It owns the carrier credentials, the phone number, tunnel wiring, prewarm caches, and the `asyncio.Future` pair (`tunnel_ready`, `ready`) that signal when the embedded server is safe for outbound calls.

```python
# libraries/python/getpatter/client.py
phone = Patter(
    carrier=Twilio(account_sid="AC...", auth_token="..."),
    phone_number="+15550001234",
    tunnel=True,           # CloudflareTunnel(), Static(hostname=...), or bool
)
```

`Patter.__init__` normalises the carrier via `_unpack_carrier` (resolves `Twilio` or `Telnyx` instances into a `LocalConfig`), then resolves the tunnel directive into a webhook hostname or defers it to `serve()`. Cloud/API-key mode is explicitly rejected with `NotImplementedError` — this release is local mode only.

Sources: [libraries/python/getpatter/client.py:152-264]()

### 2. `Agent` — the frozen AI configuration

`agent()` is a factory method on `Patter` that returns a frozen `Agent` dataclass. `Agent` carries every parameter the server dispatches at call time: `system_prompt`, `voice`, `model`, `language`, `tools`, `guardrails`, `hooks`, `vad`, `stt`, `tts`, `llm`, `variables`, and latency-tuning knobs like `barge_in_threshold_ms` and `aggressive_first_flush`.

```python
# Realtime mode (speech-to-speech via OpenAI)
agent = phone.agent(
    engine=OpenAIRealtime(),
    system_prompt="You are a friendly receptionist.",
    first_message="Hello! How can I help?",
    tools=[transfer_tool],
    guardrails=[profanity_rail],
)
```

The `provider` field is a closed literal union: `"openai_realtime" | "elevenlabs_convai" | "pipeline"`. Passing an `engine=` object sets `provider` implicitly. Passing `stt=` + `tts=` implies `pipeline` mode. Passing both raises a conflict.

Sources: [libraries/python/getpatter/models.py:108-210]()

### 3. `serve()` / `call()` — the embedded server

`serve()` starts a FastAPI/uvicorn server that registers WebSocket and webhook routes for inbound calls, applies tunnel auto-configuration, and blocks until the process exits. `call()` places an outbound call via the carrier REST API and coordinates prewarm tasks in parallel with the ring window.

```typescript
// TypeScript — identical surface
const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });
const agent = phone.agent({ engine: new OpenAIRealtime(), systemPrompt: "..." });
await phone.serve({ agent, tunnel: true, dashboard: true });
```

`serve()` accepts `onCallStart`, `onCallEnd`, `onTranscript`, `onMetrics`, `recording`, `voicemailMessage`, `dashboard`, and `manageWebhook` options. `call()` accepts `to`, `agent`, `machineDetection`, `onMachineDetection`, `voicemailMessage`, and `ringTimeout`.

Sources: [libraries/typescript/src/types.ts:283-374]()

---

## End-to-End Call Data Flow

The sequence below applies to both inbound (carrier calls the webhook) and outbound (SDK calls the carrier REST API first):

```mermaid
sequenceDiagram
    participant Carrier as Carrier (Twilio / Telnyx)
    participant Server as EmbeddedServer / StreamHandler
    participant Provider as AI Provider
    participant CB as User Callbacks

    Carrier->>Server: HTTP POST /webhooks/twilio/answer (TwiML / Call Control)
    Server->>Carrier: WebSocket upgrade response (stream URL)
    Carrier-->>Server: WS frames: {event: "start", callSid, ...}
    Server->>Provider: Adopt parked WS or open new connection (STT/TTS/Realtime)
    Server->>CB: onCallStart(data)
    loop Each turn
        Carrier-->>Server: WS binary frame (mulaw 8kHz audio)
        Server->>Provider: PCM audio → STT transcript (pipeline) OR raw audio (Realtime)
        Provider-->>Server: LLM response text / speech tokens
        Server->>Provider: TTS synthesis → PCM → mulaw
        Server-->>Carrier: WS binary frame (mulaw 8kHz audio back)
        Server->>CB: onTranscript(data), onMetrics(data)
    end
    Carrier-->>Server: WS {event: "stop"}
    Server->>CB: onCallEnd({transcript, metrics})
```

Key implementation details:

- **Mulaw ↔ PCM transcoding** happens inside `StreamHandler`. Twilio delivers G.711 µ-law 8 kHz; Telnyx delivers PCM 16 kHz. The handler resamples before passing to STT adapters.
- **Prewarm** runs during the ringing window: `_park_provider_connections` opens STT/TTS/Realtime WebSockets before the callee answers, and `_spawn_prewarm_first_message` pre-renders the greeting to TTS bytes. Both are capped (`_PREWARM_CACHE_MAX = 200`, TTL 30 s) to bound memory and TTS cost on unanswered calls.
- **AMD (answering-machine detection)** is on by default. Twilio uses `MachineDetection=DetectMessageEnd` + `async_amd` (zero answer-latency for humans); Telnyx uses `answering_machine_detection=greeting_end`. The result fires `on_machine_detection(MachineDetectionResult)`.
- **Barge-in** is controlled by `barge_in_threshold_ms` (default 300 ms). Optional `barge_in_strategies` can defer cancellation until a per-strategy confirmation arrives within `barge_in_confirm_ms`.

Sources: [libraries/python/getpatter/client.py:500-680]()

---

## Provider Modes

Patter dispatches at call time to one of three provider modes, selected by the `provider` field on `Agent` / `AgentOptions`.

| Mode | `provider` value | Audio path | Typical latency |
|---|---|---|---|
| OpenAI Realtime | `"openai_realtime"` | Speech → OpenAI Realtime API (bidirectional WS, native audio) | Lowest |
| ElevenLabs ConvAI | `"elevenlabs_convai"` | Speech → ElevenLabs Conversational AI (managed) | Low |
| Pipeline | `"pipeline"` | STT → LLM → TTS (sequential, BYOC) | Configurable |

Pipeline mode is the only mode that uses `STTConfig`, `TTSConfig`, `PipelineHooks`, `vad`, `audio_filter`, `background_audio`, `llm`, `text_transforms`, and `prewarm_first_message`. Realtime and ConvAI modes route audio directly to the provider's own WebSocket; `PipelineHooks` and `prewarm_first_message` are silently ignored or warned for those modes.

```python
# Pipeline mode example
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+1...")
agent = phone.agent(
    system_prompt="...",
    stt=DeepgramSTT(api_key="..."),
    tts=ElevenLabsTTS(api_key="...", voice="rachel"),
)
```

Sources: [libraries/python/getpatter/models.py:22-29](), [libraries/python/getpatter/models.py:108-133]()

---

## SDK Entry Points

Both SDKs are published as `getpatter` and expose an identical high-level surface.

### Python SDK

**Installation**: `pip install getpatter`

The top-level `__init__.py` re-exports from three source modules:

| Symbol | Source module | Purpose |
|---|---|---|
| `Patter` | `getpatter.client` | Root SDK client |
| `Twilio`, `Telnyx` | `getpatter.carriers.*` | Carrier credentials |
| `OpenAIRealtime`, `ElevenLabsConvAI` | `getpatter.engines.*` | Engine markers |
| `Agent`, `CallMetrics`, `PipelineHooks`, `STTConfig`, `TTSConfig`, `CallControl`, `MachineDetectionResult` | `getpatter.models` | Runtime types (all frozen dataclasses) |
| `Tool`, `Guardrail`, `tool()`, `guardrail()` | `getpatter._public_api` | Declarative tool + guardrail factories |
| `DeepgramSTT`, `ElevenLabsTTS`, `CartesiaTTS`, … | `getpatter.stt.*`, `getpatter.tts.*` | Pipeline-mode adapters |

**`Tool`** is a frozen dataclass requiring exactly one of `handler` (callable) or `webhook_url` (string). The `tool()` factory accepts decorator form (`@tool`) or keyword-constructor form:

```python
# libraries/python/getpatter/_public_api.py
@tool
async def lookup_account(phone: str) -> str:
    """Look up the account for a caller."""
    return await crm.find(phone)

# or explicit form:
t = tool(name="transfer_call", description="Transfer the call.", handler=my_handler)
```

**`Guardrail`** checks LLM output before TTS. A match replaces the response with `replacement` (default: `"I'm sorry, I can't respond to that."`):

```python
from getpatter import guardrail
rail = guardrail("profanity", blocked_terms=["badword"], replacement="Let me rephrase that.")
```

Sources: [libraries/python/getpatter/_public_api.py:31-90]()

**Key `Agent` fields** (all optional except `system_prompt`):

| Field | Type | Default | Notes |
|---|---|---|---|
| `system_prompt` | `str` | required | Supports `{variable}` placeholders |
| `provider` | `ProviderMode` | `"openai_realtime"` | Set implicitly by `engine=` |
| `voice` | `str` | `"alloy"` | Provider-specific voice name/ID |
| `model` | `str` | `"gpt-4o-mini-realtime-preview"` | LLM / Realtime model ID |
| `tools` | `list[dict] \| None` | `None` | Accepts `Tool` instances |
| `guardrails` | `list[Guardrail] \| None` | `None` | Output filters |
| `hooks` | `PipelineHooks \| None` | `None` | Pipeline-mode stage hooks |
| `prewarm` | `bool` | `True` | Warm provider connections during ring |
| `prewarm_first_message` | `bool` | `False` (raw); `True` via factory in pipeline | Pre-render greeting TTS |
| `barge_in_threshold_ms` | `int` | `300` | ms of speech before interrupt |
| `disable_phone_preamble` | `bool` | `False` | Skip phone-friendly system prefix |

Sources: [libraries/python/getpatter/models.py:108-265]()

### TypeScript SDK

**Installation**: `npm install getpatter`

The TypeScript SDK mirrors the Python surface. Types live in `types.ts`; the runtime class and factory functions live in `client.ts` and `public-api.ts`.

| Symbol | Source file | Purpose |
|---|---|---|
| `Patter` (class) | `src/client.ts` | Root SDK client |
| `Twilio`, `Telnyx` | `src/telephony/twilio.ts`, `src/telephony/telnyx.ts` | Carrier credentials |
| `OpenAIRealtime`, `ElevenLabsConvAI` | `src/engines/openai.ts`, `src/engines/elevenlabs.ts` | Engine markers |
| `Tool` (class), `Guardrail` (class), `tool()`, `guardrail()` | `src/public-api.ts` | Tool + guardrail factories |
| `AgentOptions`, `ServeOptions`, `LocalCallOptions`, `PipelineHooks`, `MachineDetectionResult` | `src/types.ts` | TypeScript interfaces |

The TypeScript `Tool` class validates in its constructor that exactly one of `handler` or `webhookUrl` is provided, matching the Python `Tool.__post_init__` invariant. `Guardrail` exposes `blockedTerms`, `check`, and `replacement`. Both classes implement the internal `ToolDefinition` / `Guardrail` interface contracts so they drop in as plain objects anywhere the SDK accepts them.

```typescript
// libraries/typescript/src/public-api.ts
import { Tool, Guardrail, tool, guardrail } from "getpatter";

const t = new Tool({
  name: "lookup_account",
  description: "Look up a CRM account.",
  handler: async (args) => JSON.stringify(await crm.find(args.phone)),
});

const rail = new Guardrail({ name: "profanity", blockedTerms: ["badword"] });
```

Sources: [libraries/typescript/src/public-api.ts:1-126](), [libraries/typescript/src/types.ts:175-280]()

**`ServeOptions` key fields** (TypeScript):

| Field | Type | Notes |
|---|---|---|
| `agent` | `AgentOptions` | Required |
| `port` | `number` | Default 8000 |
| `tunnel` | `boolean` | Auto-start Cloudflare tunnel |
| `dashboard` | `boolean` | Serve built-in UI at `/dashboard` |
| `recording` | `boolean` | Enable carrier-side recording |
| `onCallStart / onCallEnd / onTranscript / onMetrics` | `CallEventHandler` | Lifecycle callbacks |
| `onMessage` | `PipelineMessageHandler \| string` | Pipeline custom LLM or webhook URL |
| `manageWebhook` | `boolean` | Auto-configure carrier webhook on startup |

---

## Observability and Metrics

`CallMetrics` (Python dataclass / TS interface) is delivered to the `onCallEnd` callback and stored in the dashboard. It includes `LatencyBreakdown` per turn (with `stt_ms`, `llm_ms`, `tts_ms`, `agent_response_ms`, `endpoint_ms`, `bargein_ms`, `llm_ttft_ms`) and a `CostBreakdown` (STT, TTS, LLM, telephony in USD). Percentile summaries (`latency_p50`, `latency_p90`, `latency_p95`, `latency_p99`) are computed across all turns of the call.

`agent_response_ms` is the user-perceived latency metric: `endpoint_ms + llm_ttft_ms + tts_ms`. It excludes how long the caller spoke, isolating only system-controlled latency — the number to watch on p95 SLO dashboards.

Tracing uses vendor-neutral OpenTelemetry. No external collector is required for the built-in dashboard.

Sources: [libraries/python/getpatter/models.py:295-385]()

---

## Reference Organization

The rest of this wiki is organized by functional area:

| Area | What it covers |
|---|---|
| **Carriers** | Twilio vs. Telnyx configuration, AMD, DTMF, transfer, recording parity |
| **Provider Modes** | OpenAI Realtime, ElevenLabs ConvAI, Pipeline (STT + LLM + TTS selection) |
| **Tools & Guardrails** | `Tool` + `tool()`, `Guardrail` + `guardrail()`, webhook vs. handler dispatch, MCP servers |
| **Pipeline Hooks** | `PipelineHooks` stage contract (`before_send_to_stt` → `after_transcribe` → `before_llm` → `after_llm` → `before_synthesize` → `after_synthesize`) |
| **Tunneling** | Cloudflare Quick Tunnel, ngrok, `Static(hostname=...)`, production webhook patterns |
| **Latency & Prewarm** | `prewarm`, `prewarm_first_message`, parked provider WebSockets, `aggressive_first_flush` |
| **Observability** | `CallMetrics`, `LatencyBreakdown`, `CostBreakdown`, dashboard, OpenTelemetry |
| **Outbound Calls** | `call()`, AMD, voicemail drop, `ring_timeout`, `MachineDetectionResult` |
| **Speech Events** | `on_user_speech_started/ended`, `on_agent_speech_started/ended`, `on_llm_token`, `on_audio_out` |
| **Testing** | `phone.test(agent)` — local playback without a carrier |

---

Patter's design keeps every AI provider, telephony carrier, STT, TTS, and LLM component pluggable. No Patter-hosted backend is required in this release — all media and model calls flow directly from your infrastructure to the carriers and provider APIs you configure.
