# Installation

> Install via `uv tool install hyperextract` or `uv pip install hyperextract`, Python version constraints, optional provider extras (`anthropic`, `google`, `all`), and first-run configuration prerequisites.

- Repository: yifanfeng97/Hyper-Extract
- GitHub: https://github.com/yifanfeng97/Hyper-Extract
- Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf
- Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt

## Source Files

- `pyproject.toml`
- `README.md`
- `.python-version`
- `.env.example`
- `hyperextract/cli/__main__.py`

---

---
title: Installation
description: Install via `uv tool install hyperextract` or `uv pip install hyperextract`, Python version constraints, optional provider extras (`anthropic`, `google`, `all`), and first-run configuration prerequisites.
---

Hyper-Extract ships as the PyPI package `hyperextract`. Choose a **CLI install** when you want the `he` command globally, or a **library install** when you import `Template` and `create_client` from Python. Both paths require **Python 3.11+** and a configured LLM/embedder before extraction commands succeed.

## Requirements

| Requirement | Details |
| --- | --- |
| Python | **3.11 or 3.12** (`requires-python = ">=3.11"`) |
| Package manager | [uv](https://docs.astral.sh/uv/) recommended; `pip` / `pipx` also supported |
| LLM capability | Models must support structured output (`json_schema` or function calling) |
| Credentials | API key for cloud providers, or local vLLM endpoints with `base_url` |

The core package includes `langchain-openai` and works with any OpenAI-compatible endpoint (OpenAI, Bailian, vLLM, proxies). Optional extras add native LangChain integrations for Anthropic and Google.

## Install the CLI

Use this path when you want `he` on your PATH for parse, search, talk, and config workflows.

<Tabs>
<Tab title="uv (recommended)">

```bash
uv tool install hyperextract
```

Install with all optional provider integrations:

```bash
uv tool install 'hyperextract[all]'
```

</Tab>
<Tab title="pipx">

```bash
pipx install hyperextract
```

With extras:

```bash
pipx install 'hyperextract[all]'
```

</Tab>
</Tabs>

<Steps>
<Step title="Verify the CLI">

```bash
he --version
```

Expected output:

```
Hyper-Extract CLI version 0.2.0
```

</Step>
<Step title="Confirm the entry point">

Running `he` with no subcommand prints the command overview and exits. This confirms Typer registration and Rich rendering work in your environment.

```bash
he
```

</Step>
</Steps>

## Install as a Python library

Use this path for notebooks, services, or scripts that call `Template.create()`, `feed_text()`, and `create_client()` directly.

<Tabs>
<Tab title="uv">

```bash
uv pip install hyperextract
```

With a specific extra:

```bash
uv pip install 'hyperextract[anthropic]'
uv pip install 'hyperextract[google]'
uv pip install 'hyperextract[all]'
```

</Tab>
<Tab title="pip">

```bash
pip install hyperextract
```

With extras:

```bash
pip install 'hyperextract[anthropic,google]'
```

</Tab>
</Tabs>

<Steps>
<Step title="Verify the import">

```python
import hyperextract
print(hyperextract.__version__)
```

</Step>
<Step title="Smoke-test the API surface">

```python
from hyperextract import Template
print(len(Template.list()), "templates available")
```

</Step>
</Steps>

## Optional provider extras

Extras install additional LangChain provider packages. They are **not required** for OpenAI-compatible endpoints, which the default install already supports.

| Extra | Installs | When to use |
| --- | --- | --- |
| `anthropic` | `langchain-anthropic>=0.3.0` | Native Anthropic Claude clients |
| `google` | `langchain-google-genai>=2.1.0` | Native Google Gemini clients |
| `all` | Both `anthropic` and `google` | Multi-provider projects or CI parity |

<CodeGroup>
```bash title="CLI with all extras"
uv tool install 'hyperextract[all]'
```

```bash title="Library with Anthropic only"
uv pip install 'hyperextract[anthropic]'
```

```bash title="Library with all extras"
uv pip install 'hyperextract[all]'
```
</CodeGroup>

## What the default install includes

The base `hyperextract` package bundles extraction, indexing, CLI, and OpenAI-compatible client support:

- **AutoTypes and template engine** — eight knowledge structures and 80+ YAML presets
- **CLI** — `he` command via Typer + Rich (`typer`, `rich`, `tomli-w`)
- **Semantic search** — FAISS CPU backend (`faiss-cpu`)
- **LangChain stack** — `langchain`, `langchain-community`, `langchain-openai`
- **Visualization** — OntoSight integration (`ontosight`, `ontomem`)
- **Utilities** — `structlog`, `python-dotenv`, `semhash`

No separate install step is needed for Bailian or vLLM when using OpenAI-compatible URLs.

## First-run configuration

Extraction commands (`he parse`, `he feed`, `he search`, `he talk`, `he show`, `he build-index`) call `validate_config()` before running. Without valid LLM and embedder settings, the CLI exits with an error.

<Steps>
<Step title="Choose a configuration method">

<Tabs>
<Tab title="Interactive (recommended)">

```bash
he config init
```

Walks through provider selection (`openai`, `bailian`, `vllm`, or custom), LLM model, embedder model, API key, and base URL. Writes `~/.he/config.toml`.

</Tab>
<Tab title="Quick setup (OpenAI)">

```bash
he config init -k YOUR_OPENAI_API_KEY
```

Defaults to OpenAI: `gpt-4o-mini` (LLM) and `text-embedding-3-small` (embedder).

</Tab>
<Tab title="Quick setup (provider preset)">

```bash
he config init -p openai -k YOUR_OPENAI_API_KEY
he config init -p bailian -k YOUR_BAILIAN_API_KEY
```

Applies preset models and base URLs from the provider registry.

</Tab>
<Tab title="Environment variables">

Copy `.env.example` as a starting point:

```bash
export OPENAI_API_KEY=sk-your-api-key-here
export OPENAI_BASE_URL=https://api.openai.com/v1
```

Environment variables fill gaps when `~/.he/config.toml` omits a key or URL.

</Tab>
</Tabs>

</Step>
<Step title="Validate configuration">

```bash
he config show
```

Confirms provider, model, masked API key, and base URL for both LLM and embedder services.

</Step>
<Step title="Run your first extraction">

Once configuration is valid, proceed to a full workflow in [Quickstart](/quickstart).

</Step>
</Steps>

### Configuration file location

Hyper-Extract stores CLI settings at:

```
~/.he/config.toml
```

The file contains `[llm]` and `[embedder]` sections with `provider`, `model`, `api_key`, and `base_url` fields. See [Configuration reference](/configuration-reference) for the full schema and precedence rules.

### Provider-specific prerequisites

<AccordionGroup>
<Accordion title="OpenAI">

<ParamField body="api_key" type="string" required>
Valid OpenAI API key. Set via `he config init -k ...` or `OPENAI_API_KEY`.
</ParamField>

<ParamField body="base_url" type="string">
Defaults to `https://api.openai.com/v1`. Override with `-u` or `OPENAI_BASE_URL` for proxies.
</ParamField>

Verified models include `gpt-4o`, `gpt-4o-mini`, and `gpt-5`.

</Accordion>
<Accordion title="Bailian (Alibaba Cloud)">

```bash
he config init -p bailian -k YOUR_BAILIAN_API_KEY
```

Preset defaults: `qwen3.6-plus` (LLM), `text-embedding-v4` (embedder), base URL `https://dashscope.aliyuncs.com/compatible-mode/v1`.

</Accordion>
<Accordion title="Local vLLM">

vLLM requires explicit `base_url` for both services. API key can be `dummy`.

```bash
he config llm -p vllm -u http://localhost:8000/v1 -k dummy -m Qwen3.5-9B
he config embedder -p vllm -u http://localhost:8001/v1 -k dummy -m bge-m3
```

Or configure programmatically:

```python
from hyperextract import create_client

llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)
```

</Accordion>
</AccordionGroup>

### Optional debug logging

Set these environment variables before running `he` commands:

<ParamField body="HYPER_EXTRACT_LOG_LEVEL" type="string">
Log level for structlog output. Values: `DEBUG`, `INFO`, `WARNING`, `ERROR`. Default: `WARNING`.
</ParamField>

<ParamField body="HYPER_EXTRACT_LOG_FILE" type="string">
Optional file path for persistent log output.
</ParamField>

## Development installation

To modify source or run tests locally, clone the repository and install in editable mode:

```bash
git clone https://github.com/yifanfeng97/hyper-extract.git
cd hyper-extract
uv pip install -e ".[all]"
uv pip install --group dev pytest pytest-cov
```

CI validates against Python **3.11** and **3.12** on Ubuntu and macOS. See [Contributing](/contributing) for the full development workflow.

## Troubleshooting installation

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| `command not found: he` | CLI not on PATH | Re-run `uv tool install hyperextract` or add the uv tools bin directory to PATH |
| `LLM API key is not configured` | Missing first-run setup | Run `he config init` or export `OPENAI_API_KEY` |
| `vLLM provider requires base_url` | vLLM preset without URL | Set `-u http://host:port/v1` on both `he config llm` and `he config embedder` |
| `No module named 'langchain_anthropic'` | Anthropic extra not installed | `uv pip install 'hyperextract[anthropic]'` |
| Extraction produces empty or invalid JSON | Model lacks structured output | Switch to a verified model; see [Provider system](/provider-system) |

More failure modes and fixes: [Troubleshooting](/troubleshooting).

## Related pages

<CardGroup cols={2}>
<Card title="Quickstart" href="/quickstart">
Run `he config init`, parse a document, and query the resulting Knowledge Abstract.
</Card>
<Card title="Configure providers" href="/configure-providers">
Deep setup for mixed cloud and local vLLM deployments.
</Card>
<Card title="Overview" href="/overview">
What Hyper-Extract exposes and the shortest path from install to a queryable Knowledge Abstract.
</Card>
<Card title="Configuration reference" href="/configuration-reference">
Full `~/.he/config.toml` schema and environment variable precedence.
</Card>
</CardGroup>
