# Quickstart

> First successful extraction: `he config init`, `he parse` with a preset template, `he search` / `he show`, and the equivalent Python `Template.create` + `feed_text` path using the Tesla biography example.

- Repository: yifanfeng97/Hyper-Extract
- GitHub: https://github.com/yifanfeng97/Hyper-Extract
- Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf
- Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt

## Source Files

- `README.md`
- `hyperextract/cli/README.md`
- `examples/en/tesla.md`
- `hyperextract/templates/presets/general/biography_graph.yaml`
- `hyperextract/cli/cli.py`
- `hyperextract/__init__.py`

---

---
title: "Quickstart"
description: "First successful extraction: `he config init`, `he parse` with a preset template, `he search` / `he show`, and the equivalent Python `Template.create` + `feed_text` path using the Tesla biography example."
---

Hyper-Extract turns unstructured text into a queryable Knowledge Abstract (KA) on disk. The shortest path is: configure LLM and embedder clients, run `he parse` with a preset YAML template such as `general/biography_graph` over `examples/en/tesla.md`, then query the KA with `he search` or visualize it with `he show`. The Python SDK exposes the same lifecycle through `Template.create` and `feed_text`.

<Note>
Requires Python 3.11+, an LLM with structured output (`json_schema` or function calling), and an OpenAI-compatible embedder for semantic search. See [Installation](/installation) for package setup.
</Note>

## Prerequisites

| Requirement | Details |
| --- | --- |
| Package | `hyperextract` installed via `uv tool install hyperextract` (CLI) or `uv pip install hyperextract` (Python) |
| API access | LLM and embedder credentials, or a local vLLM deployment |
| Sample input | `examples/en/tesla.md` — Nikola Tesla biography in English |
| Template | `general/biography_graph` — `temporal_graph` preset for biographies |

## End-to-end workflow

```text
he config init          →  ~/.he/config.toml (LLM + embedder)
        │
        ▼
he parse tesla.md       →  ./output/  (data.json, metadata.json, index/)
  -t general/biography_graph
  -l en
        │
        ├─► he show ./output/     (OntoSight graph)
        └─► he search ./output/   (semantic retrieval)
```

The `he parse` command calls `Template.create`, ingests text with `feed_text`, writes the KA with `dump`, and builds a FAISS index by default.

## Step 1: Configure providers

Run `he config init` once. Configuration is stored at `~/.he/config.toml`. Environment variables (`OPENAI_API_KEY`, `OPENAI_BASE_URL`) override file settings when set.

<Steps>
<Step title="Choose a provider preset">

<Tabs>
<Tab title="OpenAI">

```bash
he config init -k YOUR_OPENAI_API_KEY
```

Sets provider `openai`, LLM `gpt-4o-mini`, embedder `text-embedding-3-small`.

</Tab>
<Tab title="Bailian">

```bash
he config init -p bailian -k YOUR_BAILIAN_API_KEY
```

Uses Bailian defaults: `qwen3.6-plus` (LLM) and `text-embedding-v4` (embedder).

</Tab>
<Tab title="Local vLLM">

```bash
he config llm -p vllm -u http://localhost:8000/v1 -k dummy -m Qwen3.5-9B
he config embedder -p vllm -u http://localhost:8001/v1 -k dummy -m bge-m3
```

vLLM requires explicit `base_url` values for LLM and embedder endpoints.

</Tab>
</Tabs>

</Step>
<Step title="Verify configuration">

```bash
he config show
```

Confirm both LLM and embedder rows show a model and API key (or `dummy` for local vLLM).

</Step>
</Steps>

<ParamField body="--api-key" type="string" required>
API key applied to both LLM and embedder in quick-init mode (`-k` / `--api-key`).
</ParamField>

<ParamField body="--provider" type="string">
Provider preset: `openai`, `bailian`, or `vllm`. Omit for OpenAI defaults when only `--api-key` is supplied.
</ParamField>

<ParamField body="--base-url" type="string">
Custom OpenAI-compatible endpoint. Used with `--provider` or standalone OpenAI init.
</ParamField>

## Step 2: Extract with the CLI

Parse the Tesla biography into a temporal knowledge graph. Knowledge templates require `--lang`; method templates (`-m`) default to English and ignore `--lang`.

```bash
he parse examples/en/tesla.md \
  -t general/biography_graph \
  -o ./output/ \
  -l en
```

<ParamField body="-t, --template" type="string" required>
Template ID. `general/biography_graph` resolves to the biography temporal-graph preset.
</ParamField>

<ParamField body="-o, --output" type="string" required>
Output KA directory. Must be empty unless `--force` is passed.
</ParamField>

<ParamField body="-l, --lang" type="string" required>
Language code (`en` or `zh`). Required for knowledge templates.
</ParamField>

<ParamField body="--no-index" type="boolean">
Skip FAISS index build. Search and chat require a later `he build-index` run.
</ParamField>

<ParamField body="-f, --force" type="boolean">
Overwrite a non-empty output directory.
</ParamField>

<RequestExample>

```bash
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en
```

</RequestExample>

<ResponseExample>

```text
Input: examples/en/tesla.md
Output: ./output/
Template: general/biography_graph
Language: en
Build Index: Yes

Template resolved: Biography Graph Template
Success! Knowledge extracted to output

What's next?
  he show ./output/                    # Visualize knowledge graph
  he feed ./output/ <new_document>     # Append more documents
  he search ./output/ "keyword"        # Semantic search
  he talk ./output/ -i                 # Interactive chat
```

</ResponseExample>

### Output layout

:::files
./output/
├── data.json       # Extracted entities and relations
├── metadata.json   # Template ID, language, timestamps
└── index/          # FAISS vector store (when index is built)
:::

`general/biography_graph` produces a `temporal_graph` with entities (`name`, `type`, `description`) and relations (`source`, `target`, `type`, `time`, `description`). Relation identifiers follow `{source}|{type}|{target}`; the `time` field captures biographical dates.

## Step 3: Search the Knowledge Abstract

Semantic search requires a built index. `he parse` builds one by default.

```bash
he search ./output/ "What are Tesla's major achievements?" -n 5
```

<ParamField body="query" type="string" required>
Natural-language search string.
</ParamField>

<ParamField body="-n, --top-k" type="integer">
Number of results to return. Default: `3`.
</ParamField>

<RequestExample>

```bash
he search ./output/ "Who was Tesla's main business partner?" -n 3
```

</RequestExample>

<ResponseExample>

```text
Knowledge Abstract: ./output/
Query: Who was Tesla's main business partner?
Top K: 3

Found 3 result(s):

Result 1:
{
  "name": "George Westinghouse",
  "type": "person",
  "description": "Founder of Westinghouse Electric Company..."
}
```

</ResponseExample>

<Tip>
Run `he info ./output/` to inspect node/edge counts and whether the index exists before searching.
</Tip>

## Step 4: Visualize with OntoSight

```bash
he show ./output/
```

Loads the KA from disk, recreates the template instance, and opens an interactive graph in the browser. Entity labels use `{name}`; relation labels use `{type}@{time}` per the template `display` block.

## Python equivalent

The SDK mirrors the CLI path. `Template.create` reads `~/.he/config.toml` when `llm_client` and `embedder` are omitted. Use `feed_text` to ingest into the current instance—the same call `he parse` makes internally.

<CodeGroup>

```python title="feed_text (matches CLI parse)"
from pathlib import Path
from hyperextract import Template

ka = Template.create("general/biography_graph", language="en")

text = Path("examples/en/tesla.md").read_text(encoding="utf-8")
ka.feed_text(text)

ka.build_index()
ka.dump("./output/")
ka.show()
```

```python title="parse (one-shot, new instance)"
from pathlib import Path
from hyperextract import Template

ka = Template.create("general/biography_graph", language="en")

text = Path("examples/en/tesla.md").read_text(encoding="utf-8")
result = ka.parse(text)          # returns a new instance; ka is unchanged

result.build_index()
result.dump("./output/")
result.show()
```

```python title="search and chat")
from hyperextract import Template

ka = Template.create("general/biography_graph", language="en")
ka.load("./output/")

results = ka.search("What were Tesla's major inventions?", top_k=5)
for item in results:
    print(item)

response = ka.chat("Summarize Tesla's War of Currents")
print(response.content)
```

</CodeGroup>

| Method | Behavior |
| --- | --- |
| `feed_text(text)` | Merges extracted data into the current instance. Supports chaining. |
| `parse(text)` | Returns a new instance without modifying the caller. Use for previews or branches. |
| `build_index()` | Builds FAISS index required for `search` and `chat`. |
| `dump(path)` | Writes `data.json`, `metadata.json`, and `index/`. |
| `load(path)` | Restores a saved KA from disk. |
| `show()` | Opens OntoSight visualization. |

<Warning>
Pass `language="en"` (or `"zh"`) when creating knowledge templates. Method templates such as `method/light_rag` always use English prompts regardless of the `language` argument.
</Warning>

### Optional: explicit clients

Override global config with programmatic clients for mixed cloud/local deployments:

```python
from hyperextract import Template, create_client

llm, embedder = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

ka = Template.create(
    "general/biography_graph",
    language="en",
    llm_client=llm,
    embedder=embedder,
)
ka.feed_text(open("examples/en/tesla.md").read())
```

## Verification checklist

<Check>
`he config show` reports LLM and embedder models with valid credentials.
</Check>

<Check>
`he info ./output/` shows non-zero node/edge counts and index status **Built**.
</Check>

<Check>
`he search ./output/ "Tesla coil"` returns entity or relation matches.
</Check>

<Check>
`he show ./output/` opens a graph with Tesla, Edison, Westinghouse, and dated relations.
</Check>

## Common failures

| Symptom | Fix |
| --- | --- |
| `No API key found` / config validation error | Run `he config init` or set `OPENAI_API_KEY` |
| `--lang is required for knowledge templates` | Add `-l en` or `-l zh` to `he parse` |
| `Output directory already exists and is not empty` | Use `-f` or choose a new `-o` path |
| `search` fails on missing index | Re-run parse without `--no-index`, or run `he build-index ./output/` |
| `Template not found` | List presets with `he list template -q biography` |

Set `HYPER_EXTRACT_LOG_LEVEL=DEBUG` for extraction-stage logging (`feed_text`, index build, template resolution).

## Next

<CardGroup>
<Card title="Tesla biography recipe" href="/tesla-biography-recipe">
Full CLI and Python walkthrough for `examples/en/tesla.md` with expected artifacts and sample queries.
</Card>
<Card title="Configure providers" href="/configure-providers">
Per-service `he config llm` / `he config embedder`, environment variables, and `create_client()` patterns.
</Card>
<Card title="Search, chat, and visualize" href="/search-chat-visualize">
`he talk`, `he info`, interactive modes, and `AutoType.show()` details.
</Card>
<Card title="Knowledge Abstracts" href="/knowledge-abstracts">
On-disk KA model, lifecycle methods, and incremental updates via `he feed`.
</Card>
</CardGroup>
