Agent-readable docs

Hyper-Extract Documentation

Reference for the Hyper-Extract LLM knowledge extraction framework: CLI (`he`), Python API (`Template`, AutoTypes, `create_client`), YAML templates, extraction methods, and Knowledge Abstract lifecycle.

Pages

OverviewWhat Hyper-Extract exposes (CLI `he`, Python `Template` API, 8 AutoTypes, 80+ YAML presets, 9 extraction methods), runtime assumptions (Python 3.11+, structured LLM output), and the shortest path from install to a queryable Knowledge Abstract.
InstallationInstall via `uv tool install hyperextract` or `uv pip install hyperextract`, Python version constraints, optional provider extras (`anthropic`, `google`, `all`), and first-run configuration prerequisites.
QuickstartFirst successful extraction: `he config init`, `he parse` with a preset template, `he search` / `he show`, and the equivalent Python `Template.create` + `feed_text` path using the Tesla biography example.
Knowledge AbstractsThe on-disk Knowledge Abstract (KA) model: `data.json`, `metadata.json`, and `index/` layout; lifecycle methods (`parse`, `feed_text`, `dump`, `load`, `build_index`); and incremental evolution via `he feed`.
Auto-TypesEight strongly-typed extraction primitives (`AutoModel`, `AutoList`, `AutoSet`, `AutoGraph`, `AutoHypergraph`, `AutoTemporalGraph`, `AutoSpatialGraph`, `AutoSpatioTemporalGraph`): structure, merge behavior, indexing, and when to pick each type.
Templates vs methodsDomain YAML templates (`general/biography_graph`, `finance/earnings_summary`, etc.) versus algorithm-driven method templates (`method/light_rag`, `method/atom`); language requirements (`--lang` for templates, English-only for methods); and selection criteria.
Provider systemBYOC/BYOK provider model: `openai`, `bailian`, and `vllm` presets; `provider:model@url` shorthand; `CompatibleEmbeddings` for non-OpenAI endpoints; and verified model compatibility requirements (`json_schema` / function calling).
Configure providersSet up LLM and embedder clients via `he config init`, per-service `he config llm` / `he config embedder`, environment variables, or programmatic `create_client()` for mixed cloud and local vLLM deployments.
Extract and evolve knowledgeRun `he parse` (single file, directory of `.md`/`.txt`, or stdin), choose templates interactively or by ID, control indexing with `--no-index`, append documents with `he feed`, and rebuild indexes with `he build-index`.
Search, chat, and visualizeQuery Knowledge Abstracts with `he search` and `he talk` (single query or `-i` interactive mode), inspect stats via `he info`, and render graphs through OntoSight with `he show` or `AutoType.show()`.
Create custom templatesAuthor domain YAML templates: type selection, field and identifier design, multilingual `language` blocks, merge strategies, and validation workflow per the design guide and preset base templates.
Use extraction methodsInvoke algorithm templates via `he parse -m light_rag` or `Template.create("method/hyper_rag")`; direct method classes (`Light_RAG`, `Atom`, etc.); and method-specific kwargs such as `observation_time` for temporal extractors.
Template design skillsAgent-assisted template authoring with `hyperextract-skills`: brainstorm requirements, record/graph designers, yaml-validator rules, template-optimizer fixes, and multilingual conversion workflows.
CLI referenceComplete `he` command surface: `parse`, `feed`, `build-index`, `search`, `talk`, `show`, `info`, `list template`, `list method`, `config` subcommands, flags, defaults, exit conditions, and input/output contracts.
Python API referenceExported SDK: `Template.create/get/list`, `BaseAutoType` lifecycle (`parse`, `feed_text`, `search`, `chat`, `dump`, `load`, `build_index`, `show`), `create_client` / `create_llm` / `create_embedder` / `get_client`, and logging helpers.
Configuration reference`~/.he/config.toml` schema for `[llm]` and `[embedder]`, provider presets and default models, environment variable precedence (`OPENAI_API_KEY`, `OPENAI_BASE_URL`, `HYPER_EXTRACT_LOG_LEVEL`, `HYPER_EXTRACT_LOG_FILE`), and validation rules.
Template schema referenceYAML template fields (`language`, `name`, `type`, `tags`, `description`, `output`, `guideline`, `identifiers`, `options`, `display`), valid autotypes and field types, merge strategies, and identifier patterns.
Extraction methods referenceRegistered methods (`graph_rag`, `light_rag`, `hyper_rag`, `hypergraph_rag`, `cog_rag`, `itext2kg`, `itext2kg_star`, `kg_gen`, `atom`): autotype output, descriptions, registry API, and constructor kwargs.
Tesla biography recipeEnd-to-end CLI and Python workflow using `examples/en/tesla.md` with `general/biography_graph`: parse, visualize, semantic search, and Q&A with expected artifacts under the output directory.
Method demosRunnable scripts under `examples/en/methods/` for each extraction engine: instantiate method classes, `feed_text`, `chat`, and `show` with LangChain clients and dotenv configuration.
TroubleshootingCommon failure modes: missing API keys, vLLM `base_url` requirements, `--lang` required for knowledge templates, empty output directory conflicts, missing `data.json` or index for `search`/`talk`, template resolution errors, and debug logging via `HYPER_EXTRACT_LOG_LEVEL`.
ContributingDevelopment setup with `uv`, running `pytest` and coverage, CI matrix (Python 3.11–3.12, Ubuntu/macOS), lint workflow, optional integration tests, and how to add templates or register new extraction methods.

Complete Markdown

The complete agent-readable Markdown files are published separately from this HTML page.