# Installation > Install via `uv tool install hyperextract` or `uv pip install hyperextract`, Python version constraints, optional provider extras (`anthropic`, `google`, `all`), and first-run configuration prerequisites. - Repository: yifanfeng97/Hyper-Extract - GitHub: https://github.com/yifanfeng97/Hyper-Extract - Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf - Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt ## Source Files - `pyproject.toml` - `README.md` - `.python-version` - `.env.example` - `hyperextract/cli/__main__.py` --- --- title: Installation description: Install via `uv tool install hyperextract` or `uv pip install hyperextract`, Python version constraints, optional provider extras (`anthropic`, `google`, `all`), and first-run configuration prerequisites. --- Hyper-Extract ships as the PyPI package `hyperextract`. Choose a **CLI install** when you want the `he` command globally, or a **library install** when you import `Template` and `create_client` from Python. Both paths require **Python 3.11+** and a configured LLM/embedder before extraction commands succeed. ## Requirements | Requirement | Details | | --- | --- | | Python | **3.11 or 3.12** (`requires-python = ">=3.11"`) | | Package manager | [uv](https://docs.astral.sh/uv/) recommended; `pip` / `pipx` also supported | | LLM capability | Models must support structured output (`json_schema` or function calling) | | Credentials | API key for cloud providers, or local vLLM endpoints with `base_url` | The core package includes `langchain-openai` and works with any OpenAI-compatible endpoint (OpenAI, Bailian, vLLM, proxies). Optional extras add native LangChain integrations for Anthropic and Google. ## Install the CLI Use this path when you want `he` on your PATH for parse, search, talk, and config workflows. ```bash uv tool install hyperextract ``` Install with all optional provider integrations: ```bash uv tool install 'hyperextract[all]' ``` ```bash pipx install hyperextract ``` With extras: ```bash pipx install 'hyperextract[all]' ``` ```bash he --version ``` Expected output: ``` Hyper-Extract CLI version 0.2.0 ``` Running `he` with no subcommand prints the command overview and exits. This confirms Typer registration and Rich rendering work in your environment. ```bash he ``` ## Install as a Python library Use this path for notebooks, services, or scripts that call `Template.create()`, `feed_text()`, and `create_client()` directly. ```bash uv pip install hyperextract ``` With a specific extra: ```bash uv pip install 'hyperextract[anthropic]' uv pip install 'hyperextract[google]' uv pip install 'hyperextract[all]' ``` ```bash pip install hyperextract ``` With extras: ```bash pip install 'hyperextract[anthropic,google]' ``` ```python import hyperextract print(hyperextract.__version__) ``` ```python from hyperextract import Template print(len(Template.list()), "templates available") ``` ## Optional provider extras Extras install additional LangChain provider packages. They are **not required** for OpenAI-compatible endpoints, which the default install already supports. | Extra | Installs | When to use | | --- | --- | --- | | `anthropic` | `langchain-anthropic>=0.3.0` | Native Anthropic Claude clients | | `google` | `langchain-google-genai>=2.1.0` | Native Google Gemini clients | | `all` | Both `anthropic` and `google` | Multi-provider projects or CI parity | ```bash title="CLI with all extras" uv tool install 'hyperextract[all]' ``` ```bash title="Library with Anthropic only" uv pip install 'hyperextract[anthropic]' ``` ```bash title="Library with all extras" uv pip install 'hyperextract[all]' ``` ## What the default install includes The base `hyperextract` package bundles extraction, indexing, CLI, and OpenAI-compatible client support: - **AutoTypes and template engine** — eight knowledge structures and 80+ YAML presets - **CLI** — `he` command via Typer + Rich (`typer`, `rich`, `tomli-w`) - **Semantic search** — FAISS CPU backend (`faiss-cpu`) - **LangChain stack** — `langchain`, `langchain-community`, `langchain-openai` - **Visualization** — OntoSight integration (`ontosight`, `ontomem`) - **Utilities** — `structlog`, `python-dotenv`, `semhash` No separate install step is needed for Bailian or vLLM when using OpenAI-compatible URLs. ## First-run configuration Extraction commands (`he parse`, `he feed`, `he search`, `he talk`, `he show`, `he build-index`) call `validate_config()` before running. Without valid LLM and embedder settings, the CLI exits with an error. ```bash he config init ``` Walks through provider selection (`openai`, `bailian`, `vllm`, or custom), LLM model, embedder model, API key, and base URL. Writes `~/.he/config.toml`. ```bash he config init -k YOUR_OPENAI_API_KEY ``` Defaults to OpenAI: `gpt-4o-mini` (LLM) and `text-embedding-3-small` (embedder). ```bash he config init -p openai -k YOUR_OPENAI_API_KEY he config init -p bailian -k YOUR_BAILIAN_API_KEY ``` Applies preset models and base URLs from the provider registry. Copy `.env.example` as a starting point: ```bash export OPENAI_API_KEY=sk-your-api-key-here export OPENAI_BASE_URL=https://api.openai.com/v1 ``` Environment variables fill gaps when `~/.he/config.toml` omits a key or URL. ```bash he config show ``` Confirms provider, model, masked API key, and base URL for both LLM and embedder services. Once configuration is valid, proceed to a full workflow in [Quickstart](/quickstart). ### Configuration file location Hyper-Extract stores CLI settings at: ``` ~/.he/config.toml ``` The file contains `[llm]` and `[embedder]` sections with `provider`, `model`, `api_key`, and `base_url` fields. See [Configuration reference](/configuration-reference) for the full schema and precedence rules. ### Provider-specific prerequisites Valid OpenAI API key. Set via `he config init -k ...` or `OPENAI_API_KEY`. Defaults to `https://api.openai.com/v1`. Override with `-u` or `OPENAI_BASE_URL` for proxies. Verified models include `gpt-4o`, `gpt-4o-mini`, and `gpt-5`. ```bash he config init -p bailian -k YOUR_BAILIAN_API_KEY ``` Preset defaults: `qwen3.6-plus` (LLM), `text-embedding-v4` (embedder), base URL `https://dashscope.aliyuncs.com/compatible-mode/v1`. vLLM requires explicit `base_url` for both services. API key can be `dummy`. ```bash he config llm -p vllm -u http://localhost:8000/v1 -k dummy -m Qwen3.5-9B he config embedder -p vllm -u http://localhost:8001/v1 -k dummy -m bge-m3 ``` Or configure programmatically: ```python from hyperextract import create_client llm, emb = create_client( llm="vllm:Qwen3.5-9B@http://localhost:8000/v1", embedder="vllm:bge-m3@http://localhost:8001/v1", api_key="dummy", ) ``` ### Optional debug logging Set these environment variables before running `he` commands: Log level for structlog output. Values: `DEBUG`, `INFO`, `WARNING`, `ERROR`. Default: `WARNING`. Optional file path for persistent log output. ## Development installation To modify source or run tests locally, clone the repository and install in editable mode: ```bash git clone https://github.com/yifanfeng97/hyper-extract.git cd hyper-extract uv pip install -e ".[all]" uv pip install --group dev pytest pytest-cov ``` CI validates against Python **3.11** and **3.12** on Ubuntu and macOS. See [Contributing](/contributing) for the full development workflow. ## Troubleshooting installation | Symptom | Likely cause | Fix | | --- | --- | --- | | `command not found: he` | CLI not on PATH | Re-run `uv tool install hyperextract` or add the uv tools bin directory to PATH | | `LLM API key is not configured` | Missing first-run setup | Run `he config init` or export `OPENAI_API_KEY` | | `vLLM provider requires base_url` | vLLM preset without URL | Set `-u http://host:port/v1` on both `he config llm` and `he config embedder` | | `No module named 'langchain_anthropic'` | Anthropic extra not installed | `uv pip install 'hyperextract[anthropic]'` | | Extraction produces empty or invalid JSON | Model lacks structured output | Switch to a verified model; see [Provider system](/provider-system) | More failure modes and fixes: [Troubleshooting](/troubleshooting). ## Related pages Run `he config init`, parse a document, and query the resulting Knowledge Abstract. Deep setup for mixed cloud and local vLLM deployments. What Hyper-Extract exposes and the shortest path from install to a queryable Knowledge Abstract. Full `~/.he/config.toml` schema and environment variable precedence.