# Quickstart > First successful extraction: `he config init`, `he parse` with a preset template, `he search` / `he show`, and the equivalent Python `Template.create` + `feed_text` path using the Tesla biography example. - Repository: yifanfeng97/Hyper-Extract - GitHub: https://github.com/yifanfeng97/Hyper-Extract - Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf - Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt ## Source Files - `README.md` - `hyperextract/cli/README.md` - `examples/en/tesla.md` - `hyperextract/templates/presets/general/biography_graph.yaml` - `hyperextract/cli/cli.py` - `hyperextract/__init__.py` --- --- title: "Quickstart" description: "First successful extraction: `he config init`, `he parse` with a preset template, `he search` / `he show`, and the equivalent Python `Template.create` + `feed_text` path using the Tesla biography example." --- Hyper-Extract turns unstructured text into a queryable Knowledge Abstract (KA) on disk. The shortest path is: configure LLM and embedder clients, run `he parse` with a preset YAML template such as `general/biography_graph` over `examples/en/tesla.md`, then query the KA with `he search` or visualize it with `he show`. The Python SDK exposes the same lifecycle through `Template.create` and `feed_text`. Requires Python 3.11+, an LLM with structured output (`json_schema` or function calling), and an OpenAI-compatible embedder for semantic search. See [Installation](/installation) for package setup. ## Prerequisites | Requirement | Details | | --- | --- | | Package | `hyperextract` installed via `uv tool install hyperextract` (CLI) or `uv pip install hyperextract` (Python) | | API access | LLM and embedder credentials, or a local vLLM deployment | | Sample input | `examples/en/tesla.md` — Nikola Tesla biography in English | | Template | `general/biography_graph` — `temporal_graph` preset for biographies | ## End-to-end workflow ```text he config init → ~/.he/config.toml (LLM + embedder) │ ▼ he parse tesla.md → ./output/ (data.json, metadata.json, index/) -t general/biography_graph -l en │ ├─► he show ./output/ (OntoSight graph) └─► he search ./output/ (semantic retrieval) ``` The `he parse` command calls `Template.create`, ingests text with `feed_text`, writes the KA with `dump`, and builds a FAISS index by default. ## Step 1: Configure providers Run `he config init` once. Configuration is stored at `~/.he/config.toml`. Environment variables (`OPENAI_API_KEY`, `OPENAI_BASE_URL`) override file settings when set. ```bash he config init -k YOUR_OPENAI_API_KEY ``` Sets provider `openai`, LLM `gpt-4o-mini`, embedder `text-embedding-3-small`. ```bash he config init -p bailian -k YOUR_BAILIAN_API_KEY ``` Uses Bailian defaults: `qwen3.6-plus` (LLM) and `text-embedding-v4` (embedder). ```bash he config llm -p vllm -u http://localhost:8000/v1 -k dummy -m Qwen3.5-9B he config embedder -p vllm -u http://localhost:8001/v1 -k dummy -m bge-m3 ``` vLLM requires explicit `base_url` values for LLM and embedder endpoints. ```bash he config show ``` Confirm both LLM and embedder rows show a model and API key (or `dummy` for local vLLM). API key applied to both LLM and embedder in quick-init mode (`-k` / `--api-key`). Provider preset: `openai`, `bailian`, or `vllm`. Omit for OpenAI defaults when only `--api-key` is supplied. Custom OpenAI-compatible endpoint. Used with `--provider` or standalone OpenAI init. ## Step 2: Extract with the CLI Parse the Tesla biography into a temporal knowledge graph. Knowledge templates require `--lang`; method templates (`-m`) default to English and ignore `--lang`. ```bash he parse examples/en/tesla.md \ -t general/biography_graph \ -o ./output/ \ -l en ``` Template ID. `general/biography_graph` resolves to the biography temporal-graph preset. Output KA directory. Must be empty unless `--force` is passed. Language code (`en` or `zh`). Required for knowledge templates. Skip FAISS index build. Search and chat require a later `he build-index` run. Overwrite a non-empty output directory. ```bash he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en ``` ```text Input: examples/en/tesla.md Output: ./output/ Template: general/biography_graph Language: en Build Index: Yes Template resolved: Biography Graph Template Success! Knowledge extracted to output What's next? he show ./output/ # Visualize knowledge graph he feed ./output/ # Append more documents he search ./output/ "keyword" # Semantic search he talk ./output/ -i # Interactive chat ``` ### Output layout :::files ./output/ ├── data.json # Extracted entities and relations ├── metadata.json # Template ID, language, timestamps └── index/ # FAISS vector store (when index is built) ::: `general/biography_graph` produces a `temporal_graph` with entities (`name`, `type`, `description`) and relations (`source`, `target`, `type`, `time`, `description`). Relation identifiers follow `{source}|{type}|{target}`; the `time` field captures biographical dates. ## Step 3: Search the Knowledge Abstract Semantic search requires a built index. `he parse` builds one by default. ```bash he search ./output/ "What are Tesla's major achievements?" -n 5 ``` Natural-language search string. Number of results to return. Default: `3`. ```bash he search ./output/ "Who was Tesla's main business partner?" -n 3 ``` ```text Knowledge Abstract: ./output/ Query: Who was Tesla's main business partner? Top K: 3 Found 3 result(s): Result 1: { "name": "George Westinghouse", "type": "person", "description": "Founder of Westinghouse Electric Company..." } ``` Run `he info ./output/` to inspect node/edge counts and whether the index exists before searching. ## Step 4: Visualize with OntoSight ```bash he show ./output/ ``` Loads the KA from disk, recreates the template instance, and opens an interactive graph in the browser. Entity labels use `{name}`; relation labels use `{type}@{time}` per the template `display` block. ## Python equivalent The SDK mirrors the CLI path. `Template.create` reads `~/.he/config.toml` when `llm_client` and `embedder` are omitted. Use `feed_text` to ingest into the current instance—the same call `he parse` makes internally. ```python title="feed_text (matches CLI parse)" from pathlib import Path from hyperextract import Template ka = Template.create("general/biography_graph", language="en") text = Path("examples/en/tesla.md").read_text(encoding="utf-8") ka.feed_text(text) ka.build_index() ka.dump("./output/") ka.show() ``` ```python title="parse (one-shot, new instance)" from pathlib import Path from hyperextract import Template ka = Template.create("general/biography_graph", language="en") text = Path("examples/en/tesla.md").read_text(encoding="utf-8") result = ka.parse(text) # returns a new instance; ka is unchanged result.build_index() result.dump("./output/") result.show() ``` ```python title="search and chat") from hyperextract import Template ka = Template.create("general/biography_graph", language="en") ka.load("./output/") results = ka.search("What were Tesla's major inventions?", top_k=5) for item in results: print(item) response = ka.chat("Summarize Tesla's War of Currents") print(response.content) ``` | Method | Behavior | | --- | --- | | `feed_text(text)` | Merges extracted data into the current instance. Supports chaining. | | `parse(text)` | Returns a new instance without modifying the caller. Use for previews or branches. | | `build_index()` | Builds FAISS index required for `search` and `chat`. | | `dump(path)` | Writes `data.json`, `metadata.json`, and `index/`. | | `load(path)` | Restores a saved KA from disk. | | `show()` | Opens OntoSight visualization. | Pass `language="en"` (or `"zh"`) when creating knowledge templates. Method templates such as `method/light_rag` always use English prompts regardless of the `language` argument. ### Optional: explicit clients Override global config with programmatic clients for mixed cloud/local deployments: ```python from hyperextract import Template, create_client llm, embedder = create_client( llm="vllm:Qwen3.5-9B@http://localhost:8000/v1", embedder="vllm:bge-m3@http://localhost:8001/v1", api_key="dummy", ) ka = Template.create( "general/biography_graph", language="en", llm_client=llm, embedder=embedder, ) ka.feed_text(open("examples/en/tesla.md").read()) ``` ## Verification checklist `he config show` reports LLM and embedder models with valid credentials. `he info ./output/` shows non-zero node/edge counts and index status **Built**. `he search ./output/ "Tesla coil"` returns entity or relation matches. `he show ./output/` opens a graph with Tesla, Edison, Westinghouse, and dated relations. ## Common failures | Symptom | Fix | | --- | --- | | `No API key found` / config validation error | Run `he config init` or set `OPENAI_API_KEY` | | `--lang is required for knowledge templates` | Add `-l en` or `-l zh` to `he parse` | | `Output directory already exists and is not empty` | Use `-f` or choose a new `-o` path | | `search` fails on missing index | Re-run parse without `--no-index`, or run `he build-index ./output/` | | `Template not found` | List presets with `he list template -q biography` | Set `HYPER_EXTRACT_LOG_LEVEL=DEBUG` for extraction-stage logging (`feed_text`, index build, template resolution). ## Next Full CLI and Python walkthrough for `examples/en/tesla.md` with expected artifacts and sample queries. Per-service `he config llm` / `he config embedder`, environment variables, and `create_client()` patterns. `he talk`, `he info`, interactive modes, and `AutoType.show()` details. On-disk KA model, lifecycle methods, and incremental updates via `he feed`.