# LLM Providers, Tool Calling & Guardrails

> The LLM backend adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), the tool decorator and Tool dataclass for in-process and webhook-dispatch tools, output guardrail rules, dynamic variable substitution, and the PatterTool integration for LangChain / OpenAI Assistants orchestrators.

- Repository: PatterAI/Patter
- GitHub: https://github.com/PatterAI/Patter
- Human wiki: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc
- Complete Markdown: https://grok-wiki.com/public/wiki/patterai-patter-57d14e233afc/llms-full.txt

## Source Files

- `libraries/python/getpatter/llm/openai.py`
- `libraries/python/getpatter/llm/anthropic.py`
- `libraries/python/getpatter/llm/groq.py`
- `libraries/python/getpatter/tools/`
- `libraries/python/getpatter/integrations/patter_tool.py`
- `libraries/typescript/src/llm/`
- `libraries/typescript/src/llm-loop.ts`

---

<details>
<summary>Relevant source files</summary>
The following files were used as context for generating this wiki page:

- [libraries/python/getpatter/llm/openai.py](libraries/python/getpatter/llm/openai.py)
- [libraries/python/getpatter/llm/anthropic.py](libraries/python/getpatter/llm/anthropic.py)
- [libraries/python/getpatter/llm/groq.py](libraries/python/getpatter/llm/groq.py)
- [libraries/typescript/src/llm/openai.ts](libraries/typescript/src/llm/openai.ts)
- [libraries/typescript/src/llm/anthropic.ts](libraries/typescript/src/llm/anthropic.ts)
- [libraries/typescript/src/llm/cerebras.ts](libraries/typescript/src/llm/cerebras.ts)
- [libraries/typescript/src/llm/google.ts](libraries/typescript/src/llm/google.ts)
- [libraries/typescript/src/llm-loop.ts](libraries/typescript/src/llm-loop.ts)
- [libraries/python/getpatter/tools/tool_decorator.py](libraries/python/getpatter/tools/tool_decorator.py)
- [libraries/python/getpatter/tools/tool_executor.py](libraries/python/getpatter/tools/tool_executor.py)
- [libraries/python/getpatter/tools/circuit_breaker.py](libraries/python/getpatter/tools/circuit_breaker.py)
- [libraries/python/getpatter/_public_api.py](libraries/python/getpatter/_public_api.py)
- [libraries/python/getpatter/models.py](libraries/python/getpatter/models.py)
- [libraries/python/getpatter/telephony/common.py](libraries/python/getpatter/telephony/common.py)
- [libraries/python/getpatter/stream_handler.py](libraries/python/getpatter/stream_handler.py)
- [libraries/python/getpatter/integrations/patter_tool.py](libraries/python/getpatter/integrations/patter_tool.py)
- [libraries/typescript/src/pipeline-hooks.ts](libraries/typescript/src/pipeline-hooks.ts)
- [docs/python-sdk/guardrails.mdx](docs/python-sdk/guardrails.mdx)
</details>

# LLM Providers, Tool Calling & Guardrails

This page describes the full backend intelligence layer of the Patter SDK: the pluggable LLM adapter system (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), how tool calling is declared and executed (in-process handlers and webhook dispatch), the pipeline hook and guardrail system that intercepts and filters LLM output, dynamic variable substitution in system prompts, and the `PatterTool` integration that exposes a live Patter phone agent as a callable tool for external orchestrators such as LangChain, OpenAI Assistants, and Hermes Agent.

All five concepts are tightly coupled in the request loop: the `LLMLoop` (TypeScript) / `LLMProvider` + `ToolExecutor` (Python) drives the per-turn conversation, consults the tool registry when the model emits a tool call, runs pipeline hooks at each stage, evaluates guardrails against the final text, and substitutes per-call variables into the system prompt before the first send.

---

## LLM Provider System

### Architecture

Patter exposes a thin `LLMProvider` interface implemented by five concrete adapters. Any adapter can be passed wherever an LLM is accepted, making the system fully BYOK (bring-your-own-key).

```text
  ┌─────────────────────────────────────────────────────┐
  │                  LLMProvider interface               │
  │  stream(messages, tools?, opts?) → AsyncIterable     │
  │  warmup?() → Promise<void>                          │
  └────────┬───────────────────────────────────────────-┘
           │
  ┌────────┴──────────────────────────────────┐
  │  Concrete providers (Python / TypeScript)  │
  │                                            │
  │  OpenAILLMProvider      (openai.com)        │
  │  AnthropicLLMProvider   (anthropic.com)     │
  │  GroqLLMProvider        (groq.com)          │
  │  CerebrasLLMProvider    (cerebras.ai)       │
  │  GoogleLLMProvider      (googleapis.com)    │
  └────────────────────────────────────────────┘
```

The TypeScript `LLMProvider` interface lives in `llm-loop.ts` and requires a single `stream()` generator method plus an optional `warmup()` pre-call hook.

Sources: [libraries/typescript/src/llm-loop.ts:243-273]()

### Provider Reference Table

| Provider | Module (Python) | Module (TypeScript) | Default model | API style | Env var |
|---|---|---|---|---|---|
| **OpenAI** | `getpatter.llm.openai` | `getpatter/llm/openai` | `gpt-4o-mini` | Chat Completions SSE | `OPENAI_API_KEY` |
| **Anthropic** | `getpatter.llm.anthropic` | `getpatter/llm/anthropic` | `claude-haiku-4-5-20251001` | Messages API SSE | `ANTHROPIC_API_KEY` |
| **Groq** | `getpatter.llm.groq` | `getpatter/llm/groq` | `llama-3.3-70b-versatile` | OpenAI-compatible | `GROQ_API_KEY` |
| **Cerebras** | *(via providers layer)* | `getpatter/llm/cerebras` | `gpt-oss-120b` | OpenAI-compatible | `CEREBRAS_API_KEY` |
| **Google Gemini** | *(via providers layer)* | `getpatter/llm/google` | `gemini-2.5-flash` | Generative Language SSE | `GEMINI_API_KEY` / `GOOGLE_API_KEY` |

### Constructing a Provider

Each provider follows the same pattern: the public `LLM` class in `getpatter/llm/<provider>` is a thin subclass of the underlying provider implementation. The API key is read from a constructor argument first, then from the environment variable, and a `ValueError` is raised if neither is present.

**Python:**
```python
from getpatter.llm import openai, anthropic, groq

llm = openai.LLM()                                   # reads OPENAI_API_KEY
llm = openai.LLM(api_key="sk-...", model="gpt-4o")
llm = anthropic.LLM(prompt_caching=False)            # opt out of cache
llm = groq.LLM(api_key="gsk_...", model="llama-3.3-70b-versatile")
```

**TypeScript:**
```ts
import * as openai   from "getpatter/llm/openai";
import * as cerebras from "getpatter/llm/cerebras";
import * as google   from "getpatter/llm/google";

const llm = new openai.LLM({ apiKey: "sk-...", temperature: 0.4 });
const llm = new cerebras.LLM({ gzipCompression: true });          // gzip for large prompts
const llm = new google.LLM({ model: "gemini-2.5-flash" });
```

Sources: [libraries/python/getpatter/llm/anthropic.py:28-63](), [libraries/typescript/src/llm/cerebras.ts:1-58]()

### Anthropic Prompt Caching

The Anthropic adapter enables prompt caching by default (`prompt_caching=True` / `promptCaching: true`). For voice agents with long system prompts, this saves ~100–400 ms TTFT and ~90% input-token cost on cached turns. Caching has no effect below Anthropic's minimum cacheable token threshold (~1024 tokens for Sonnet/Opus, ~2048 for Haiku).

### Streaming Protocol: `LLMChunk`

All providers stream chunks in a common shape. The `LLMLoop` accumulates these chunks across one or more iterations until a final text response is assembled or all tool calls are resolved.

```ts
interface LLMChunk {
  type: 'text' | 'tool_call' | 'done' | 'usage';
  // text
  content?: string;
  // tool_call
  index?: number;      // multiple tool calls share a stream; index groups them
  id?: string;
  name?: string;
  arguments?: string;  // JSON fragment; accumulated across chunks with same index
  // usage
  inputTokens?: number;
  outputTokens?: number;
  cacheReadInputTokens?: number;
  cacheCreationInputTokens?: number;
}
```

Sources: [libraries/typescript/src/llm-loop.ts:218-242]()

### Pre-call Warmup

Each provider implements an optional `warmup()` method that issues a lightweight HTTPS GET to its inference endpoint (e.g., `GET /models`). The SDK calls this once per outbound dial when `prewarm: true` is set (the default). Failures are swallowed at debug level — warmup is a latency optimisation, never a correctness gate.

---

## LLM Loop

`LLMLoop` (TypeScript) is the core turn engine. On each user utterance it builds the message list, calls `provider.stream()`, accumulates streaming chunks, dispatches any tool calls, and yields text tokens to the TTS layer. The loop runs up to **10 iterations** to handle multi-step tool chains.

```mermaid
sequenceDiagram
    participant SH as StreamHandler
    participant Loop as LLMLoop
    participant Provider as LLMProvider
    participant Exec as ToolExecutor
    SH->>Loop: run(userText, history, callCtx)
    loop ≤10 iterations
        Loop->>Provider: stream(messages, openaiTools)
        Provider-->>Loop: LLMChunk* (text | tool_call | usage)
        alt tool_call chunks
            Loop->>Exec: execute(toolDef, args, callCtx)
            Exec-->>Loop: JSON string result
            Note over Loop: append assistant+tool messages
        else no tool calls
            Loop-->>SH: yield text tokens
        end
    end
```

**Token billing fallback.** When a provider omits a `usage` chunk (observed on some Cerebras streaming variants), the loop falls back to a `chars/4` token estimate and logs a warning so operators notice the approximation.

Sources: [libraries/typescript/src/llm-loop.ts:452-530](), [libraries/typescript/src/llm-loop.ts:560-590]()

---

## Tool Calling

### The `@tool` Decorator

The Python `@tool` decorator (in `getpatter/tools/tool_decorator.py`) inspects a typed function's signature and Google-style docstring to build a complete `ToolDefinition` dict. No manual JSON Schema authoring is required.

```python
from getpatter import tool

@tool
async def get_weather(location: str, unit: str = "celsius") -> str:
    """Get the current weather for a location.

    Args:
        location: City name or zip code
        unit: Temperature unit (celsius or fahrenheit)
    """
    return f"Sunny, 22°{unit[0].upper()}"
# get_weather is now: {"name": "get_weather", "description": "...", "parameters": {...}, "handler": <adapter>}
```

**Type mapping:**

| Python type | JSON Schema type |
|---|---|
| `str` | `string` |
| `int` | `integer` |
| `float` | `number` |
| `bool` | `boolean` |
| `list` | `array` |
| `dict` / anything else | `object` |
| `Optional[X]` / `X \| None` | base type, not in `required` |

The decorator wraps the user function in an `_adapter` coroutine that bridges the runtime call signature `handler(arguments: dict, call_context: dict)` and the user-written signature `fn(location, unit)`. This prevents the common `takes 1 positional argument but 2 were given` error.

Sources: [libraries/python/getpatter/tools/tool_decorator.py:105-178]()

### The `Tool` Dataclass

The public API also exposes a `Tool` frozen dataclass (declared in `_public_api.py`) for explicit construction without the decorator. Exactly one of `handler` or `webhook_url` must be provided.

```python
from getpatter import Tool, tool

# Decorator form
@tool
async def lookup_order(order_id: str) -> str:
    """Fetch order status."""
    ...

# Keyword constructor — webhook dispatch
transfer = Tool(
    name="transfer_call",
    description="Transfer to a live agent.",
    parameters={"type": "object", "properties": {"department": {"type": "string"}}, "required": ["department"]},
    webhook_url="https://api.example.com/transfer",
)
```

`Tool` also accepts a `strict: bool` flag that enables OpenAI strict-mode schema enforcement, and a `reassurance: str | dict` field for a spoken filler message during slow tool calls (currently honoured in Realtime mode only).

Sources: [libraries/python/getpatter/_public_api.py:30-75]()

### Tool Executor: Retry, Backoff & Circuit Breaker

Both the Python `ToolExecutor` and the TypeScript `DefaultToolExecutor` share the same resilience strategy:

1. **Retry with exponential backoff** — default 2 retries (3 total attempts), 500 ms base delay, capped at 5 s, with ±60 ms jitter.
2. **Per-tool circuit breaker** — `CLOSED → OPEN` after `failureThreshold` consecutive failures (default 5), stays `OPEN` for `cooldownMs` (default 30 000 ms), then probes once (`HALF_OPEN`).
3. **Structured fallback JSON** — all failure paths return `{"error": "...", "fallback": true}` so the LLM can acknowledge the failure gracefully instead of hanging.

```mermaid
stateDiagram-v2
    [*] --> CLOSED
    CLOSED --> OPEN : ≥5 consecutive failures
    OPEN --> HALF_OPEN : cooldown (30 s) elapsed
    HALF_OPEN --> CLOSED : probe succeeds
    HALF_OPEN --> OPEN : probe fails
```

**Webhook SSRF protection (Python):** `_validate_webhook_url()` blocks non-HTTP(S) schemes, loopback/private IP addresses (including literal IPs in the URL), and a hardcoded blocklist of dangerous hostnames such as `localhost`, `metadata.google.internal`, and `ip6-loopback`.

Sources: [libraries/python/getpatter/tools/tool_executor.py:93-140](), [libraries/python/getpatter/tools/circuit_breaker.py:1-30]()

**Response size guard:** Webhook responses larger than **1 MB** are rejected immediately and the circuit records a failure. This prevents oversized tool results from exhausting the LLM's context window.

---

## Pipeline Hooks

Pipeline hooks intercept data at every stage of the STT → LLM → TTS pipeline. They are the primary extensibility point for RAG injection, content moderation, cost control, and custom logging. All hooks are **fail-open**: an exception logs an error and passes the original value through unchanged.

| Hook | Stage | Can veto? | Return `null` means |
|---|---|---|---|
| `beforeSendToStt` | Before STT | Yes | Drop audio chunk |
| `afterTranscribe` | After STT | Yes | Skip LLM turn |
| `beforeLlm` | Before LLM | No (null = keep) | Keep original messages |
| `afterLlm.onChunk` | Per token (tier 1) | No | Keep original chunk |
| `afterLlm.onSentence` | Per sentence (tier 2) | Yes (empty = drop) | Keep original sentence |
| `afterLlm.onResponse` | Full response (tier 3) | No | Keep original text |
| `beforeSynthesize` | Before TTS | Yes | Skip TTS for sentence |
| `afterSynthesize` | After TTS | Yes | Drop audio chunk |

The three-tier `afterLlm` design avoids unnecessary buffering: tier 1 (`onChunk`) and tier 2 (`onSentence`) keep streaming; only tier 3 (`onResponse`) requires buffering the full response before yielding to TTS.

Sources: [libraries/typescript/src/pipeline-hooks.ts:48-175]()

---

## Output Guardrails

Guardrails intercept the final LLM text *before* it reaches text-to-speech. They are evaluated in declaration order; the first match short-circuits evaluation and substitutes the `replacement` string.

### `Guardrail` Dataclass

```python
from getpatter import Guardrail, guardrail

# frozen dataclass form
g = Guardrail(
    name="No medical advice",
    blocked_terms=["diagnosis", "prescription", "dosage"],
    replacement="That's a medical question I can't answer.",
)

# factory form (identical outcome)
g = guardrail(
    name="No phone numbers",
    check=lambda text: bool(__import__("re").search(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b", text)),
    replacement="I can't share numbers directly.",
)
```

| Field | Type | Default | Behaviour |
|---|---|---|---|
| `name` | `str` | *required* | Logged when fired |
| `blocked_terms` | `list[str] \| None` | `None` | Case-insensitive substring scan |
| `check` | `Callable[[str], bool] \| None` | `None` | Called after `blocked_terms`; `True` = block |
| `replacement` | `str` | `"I'm sorry, I can't respond to that."` | Spoken instead of blocked response |

**Evaluation order in `stream_handler.py`:**
1. Check each `blocked_term` with `term.lower() in response_text.lower()`.
2. If no term matched, call `check(response_text)`.
3. On first match, log `WARNING "Guardrail '<name>' triggered on: <snippet>"` and return the replacement.

Sources: [libraries/python/getpatter/stream_handler.py:370-410](), [libraries/python/getpatter/models.py:31-50]()

Multiple guardrails are evaluated in list order; the first to trigger wins:

```python
agent = phone.agent(
    system_prompt="You are a financial advisor assistant.",
    guardrails=[
        guardrail(name="No stock tips", blocked_terms=["buy", "sell", "invest in"],
                  replacement="Please consult a licensed advisor."),
        guardrail(name="No PII", check=lambda t: "SSN" in t.upper(),
                  replacement="I cannot share personal identification information."),
    ],
)
```

---

## Dynamic Variable Substitution

System prompts support `{key}` placeholder substitution resolved per call via `_resolve_variables()` in `libraries/python/getpatter/telephony/common.py`. Values are sanitised (control characters stripped, capped at 500 chars) to prevent prompt injection before substitution.

```python
agent = phone.agent(
    system_prompt="You are calling {customer_name} about order #{order_id}.",
    variables={"customer_name": "Alice", "order_id": "A-1234"},
)
```

At runtime each `{key}` is replaced with `str(value)` via a simple `str.replace` loop. Unresolved placeholders are left as-is.

Sources: [libraries/python/getpatter/telephony/common.py:17-31]()

---

## `PatterTool` Integration

`PatterTool` (in `getpatter/integrations/patter_tool.py`) wraps a running `Patter` phone instance as a single callable tool consumable by external LLM orchestrators. The wire schema is shared across OpenAI, Anthropic, and Hermes Agent, so the same tool can be registered in any framework without changes.

### Exported Schemas

```python
tool = PatterTool(phone=phone, agent={"stt": stt, "llm": llm, "tts": tts})

tool.openai_schema()    # → {"type": "function", "function": {"name": "make_phone_call", ...}}
tool.anthropic_schema() # → {"name": "make_phone_call", "input_schema": {...}}
tool.hermes_schema()    # → {"name": "make_phone_call", "parameters": {...}}
```

### Wire Parameters

The fixed JSON Schema accepted by all three formats:

| Parameter | Type | Required | Description |
|---|---|---|---|
| `to` | `string` (E.164) | Yes | Destination phone number |
| `goal` | `string` | No | Becomes the in-call system prompt |
| `first_message` | `string` | No | Spoken when callee answers |
| `max_duration_sec` | `integer` [5–1800] | No | Hard call timeout; default 180 s |

### Execution Flow

```mermaid
sequenceDiagram
    participant Orchestrator as LLM Orchestrator
    participant PT as PatterTool
    participant Phone as Patter (phone)
    participant SSE as MetricsStore SSE
    Orchestrator->>PT: execute(to, goal, ...)
    PT->>Phone: call(to, overrideAgent)
    SSE-->>PT: call_initiated {call_id}
    Note over PT: Future<call_id> resolved
    Phone-->>PT: on_call_end(data)
    PT-->>Orchestrator: PatterToolResult {call_id, status, transcript, duration, cost}
```

`execute()` holds a dial lock so concurrent calls are serialised and each one captures exactly its own `call_initiated` SSE event. A configurable timeout (default 180 s) raises `TimeoutError` if the call does not complete.

Sources: [libraries/python/getpatter/integrations/patter_tool.py:100-185]()

### Hermes Agent Registration

```python
from tools.registry import registry
from getpatter.integrations import PatterTool

tool = PatterTool(phone=phone, agent={...})
tool.register_hermes(registry, toolset="patter")
# Hermes handler returns JSON string: result envelope or {"error": "..."}
```

`register_hermes` bridges Hermes' `handler(args: dict, **kw) -> Awaitable[str]` contract to `PatterTool.execute()`.

---

## Summary

Patter's LLM backend is built around a pluggable `LLMProvider` interface with five production-ready adapters (OpenAI, Anthropic, Groq, Cerebras, Google Gemini), all sharing the same streaming `LLMChunk` protocol. Tool calling is declared via the `@tool` decorator or the `Tool` dataclass and executed through a resilient `ToolExecutor` with exponential backoff, circuit breakers, SSRF protection, and structured fallback JSON. Pipeline hooks give fine-grained interception at every STT/LLM/TTS stage with fail-open semantics. Output guardrails apply keyword and callable checks against the final LLM text before synthesis. Dynamic `{variable}` substitution in system prompts is sanitized at the boundary to prevent prompt injection. Finally, `PatterTool` packages the entire phone-agent runtime as a schema-stable tool callable from OpenAI Assistants, Anthropic tool-use, or Hermes Agent with no SDK lock-in.
