# Search, chat, and visualize

> Query Knowledge Abstracts with `he search` and `he talk` (single query or `-i` interactive mode), inspect stats via `he info`, and render graphs through OntoSight with `he show` or `AutoType.show()`.

- Repository: yifanfeng97/Hyper-Extract
- GitHub: https://github.com/yifanfeng97/Hyper-Extract
- Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf
- Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt

## Source Files

- `hyperextract/cli/cli.py`
- `hyperextract/types/base.py`
- `hyperextract/types/graph.py`
- `hyperextract/cli/utils.py`
- `hyperextract/types/hypergraph.py`

---

---
title: "Search, chat, and visualize"
description: "Query Knowledge Abstracts with `he search` and `he talk` (single query or `-i` interactive mode), inspect stats via `he info`, and render graphs through OntoSight with `he show` or `AutoType.show()`."
---

Hyper-Extract exposes four exploration commands on a Knowledge Abstract (KA) directory: `he info` for metadata and counts, `he search` and `he talk` for semantic retrieval and Q&A over the vector index, and `he show` for OntoSight visualization. The Python SDK mirrors the same surface through `BaseAutoType.search()`, `chat()`, and `show()` on template-backed AutoType instances.

<Note>
`he search` and `he talk` require a populated `index/` directory. `he show` and `he info` only require `data.json`. LLM and embedder configuration is validated for search, talk, and show, but not for `he info`.
</Note>

## Prerequisites

Before querying or chatting, ensure the KA is ready:

<Steps>
<Step title="Create or load a Knowledge Abstract">

Run `he parse` (or `he feed` to append) so the output directory contains `data.json` and `metadata.json`. See [Extract and evolve](/extract-and-evolve).

</Step>
<Step title="Build the search index">

`he parse` builds the index by default. If you used `--no-index`, or appended data with `he feed`, rebuild:

```bash
he build-index ./output/
```

</Step>
<Step title="Configure providers">

Search and talk use the embedder for retrieval and the LLM for chat answers. Initialize configuration with `he config init` or set environment variables. See [Configure providers](/configure-providers).

</Step>
<Step title="Verify readiness">

```bash
he info ./output/
```

Confirm `Nodes` / `Edges` are non-zero and `Index` shows `Built`.

</Step>
</Steps>

## Inspect with `he info`

`he info` prints KA metadata and statistics without loading the full AutoType or calling an LLM.

```bash
he info ./output/
```

<ResponseExample>

```text
Knowledge Abstract Info

Path          ./output/
Template      general/biography_graph
Language      en
Created       2024-01-15 10:30:00
Updated       2024-01-15 10:35:22
Nodes         25
Edges         32
Index         Built
```

</ResponseExample>

| Field | Meaning |
|-------|---------|
| `Template` | Preset or custom template ID from `metadata.json` |
| `Language` | Processing language (`en` or `zh`) |
| `Nodes` | Entity/item count from `data.json` (`nodes`, `entities`, or list length) |
| `Edges` | Relationship count (`edges`, `relations`, or `0` for non-graph types) |
| `Index` | `Built` when `index/` exists and is non-empty; otherwise `Not Built` |

Use `he info` to confirm extraction succeeded, monitor growth after `he feed`, and check whether `he build-index` is needed before search or talk.

## Semantic search with `he search`

`he search` embeds the query, runs similarity search against the FAISS index, and prints ranked structured results as JSON.

```bash
he search ./output/ "Tesla's inventions"
he search ./output/ "electrical engineering" -n 10
```

<ParamField body="ka_path" type="string" required>
Path to the KA directory.
</ParamField>

<ParamField body="query" type="string" required>
Natural-language or keyword search string.
</ParamField>

<ParamField body="--top-k" type="integer" default="3">
Number of results. Short form: `-n`.
</ParamField>

### Retrieval pipeline

```mermaid
sequenceDiagram
    participant CLI as he search
    participant KA as AutoType instance
    participant IDX as FAISS index
    participant EMB as Embedder

    CLI->>KA: Template.create + load(ka_path)
    CLI->>KA: search(query, top_k)
    KA->>EMB: embed query
    KA->>IDX: similarity_search
    IDX-->>KA: ranked documents
    KA-->>CLI: structured items
    CLI-->>CLI: print JSON results
```

For graph and hypergraph AutoTypes, `search()` returns a tuple `(nodes, edges)`. The CLI enumerates that tuple, so output typically shows a node group and an edge group rather than flat numbered entities.

For list, set, and model AutoTypes, `search()` returns a flat list of Pydantic items—one JSON object per result.

<Tip>
Use natural-language queries (`"What were the major achievements?"`) rather than bare keywords. Increase `-n` when results feel too narrow.
</Tip>

## Chat with `he talk`

`he talk` retrieves context with the same vector index, then calls the configured LLM to synthesize an answer. Single-query and interactive modes are supported.

<Tabs>
<Tab title="Single query">

```bash
he talk ./output/ -q "What were Tesla's major achievements?"
he talk ./output/ -q "Explain the War of Currents" -n 10
```

Prints the answer to stdout. When the LLM response includes retrieved context, the CLI shows truncated `Retrieved context` lines from `response.additional_kwargs["retrieved_items"]`.

</Tab>
<Tab title="Interactive mode">

```bash
he talk ./output/ -i
```

Starts a REPL. Type questions at the `>` prompt. Exit with `exit`, `quit`, or `q`. `Ctrl+C` also exits cleanly.

</Tab>
</Tabs>

<ParamField body="--query" type="string">
Question for single-query mode. Short form: `-q`. Required unless `--interactive` is set.
</ParamField>

<ParamField body="--interactive" type="boolean" default="false">
Enter interactive chat loop. Short form: `-i`.
</ParamField>

<ParamField body="--top-k" type="integer" default="3">
Number of context items retrieved before LLM generation. Short form: `-n`.
</ParamField>

### Chat pipeline

`BaseAutoType.chat()` performs retrieval → context formatting → LLM invocation:

1. `search(query, top_k)` fetches relevant items.
2. Items are serialized to JSON (or plain text for string results) and joined into a context block.
3. A QA prompt asks the LLM to answer from that context.
4. The returned `AIMessage` includes `content` and `additional_kwargs["retrieved_items"]`.

Graph and hypergraph types override `chat()` to retrieve nodes and edges separately, format them under `=== Relevant Nodes ===` and `=== Relevant Edges ===` headers, and attach `retrieved_nodes` / `retrieved_edges` in metadata.

<Warning>
`he talk` requires either `-q` or `-i`. Running `he talk ./output/` without either exits with an error.
</Warning>

## Visualize with `he show`

`he show` loads the KA, resolves the template from `metadata.json`, and opens an OntoSight viewer in the default browser.

```bash
he show ./output/
```

Visualization works for all eight AutoTypes. Graph-based types (`AutoGraph`, `AutoHypergraph`, `AutoTemporalGraph`, `AutoSpatialGraph`, `AutoSpatioTemporalGraph`) render nodes and edges. `AutoList`, `AutoSet`, and `AutoModel` use list, set, and structured views respectively.

When both node and edge indices exist (graph types) or a single FAISS index exists (list/set/model), OntoSight wires **search** and **chat** callbacks into the viewer so you can query from the UI. Without indices, visualization is read-only.

| AutoType | OntoSight viewer | In-viewer search/chat |
|----------|------------------|----------------------|
| `AutoGraph` | `view_graph` | When `node_index` and `edge_index` exist |
| `AutoHypergraph` | `view_hypergraph` | When both indices exist |
| `AutoList` / `AutoSet` | List/set view | When FAISS index exists |
| `AutoModel` | Structured view | When index exists |

If the browser does not open automatically, check the terminal for the localhost URL and open it manually.

## Python API equivalents

The CLI commands delegate to `Template.create(template, lang)` → `load(ka_path)` → AutoType methods.

<CodeGroup>

```python Python — search and chat
from hyperextract import Template

ka = Template.create("general/biography_graph", language="en")
ka.load("./output/")
ka.build_index()  # skip if index already on disk

# Graph types return (nodes, edges)
nodes, edges = ka.search("AC power system", top_k=3)

response = ka.chat("Who was Nikola Tesla?", top_k=3)
print(response.content)
print(response.additional_kwargs.get("retrieved_nodes", []))
```

```python Python — visualize
ka.show(
    node_label_extractor=lambda n: n.name,
    edge_label_extractor=lambda e: e.type,
)
```

```python Python — in-memory workflow
from hyperextract.types import AutoGraph

graph.feed_text(text)
graph.build_index()

for q in questions:
    print(graph.chat(q).content)

graph.show()
```

</CodeGroup>

`AutoType.show()` accepts optional label extractors and `top_k_*_for_search` / `top_k_*_for_chat` kwargs on graph types to control OntoSight callback retrieval depth.

## Search vs talk

| | `he search` | `he talk` |
|---|-------------|-----------|
| Output | Raw structured items (JSON) | Natural-language answer |
| LLM call | No (embedder only) | Yes |
| Speed | Faster | Slower |
| Best for | Locating specific entities/relations | Explanations, summaries, follow-up Q&A |
| Index required | Yes | Yes |

A typical workflow: `he search` to locate relevant nodes or edges, then `he talk -q` for a synthesized explanation, then `he show` to inspect structure visually.

## Index layout

Graph and hypergraph KAs store separate FAISS indices under `index/`:

:::files
output/
├── data.json
├── metadata.json
└── index/
    ├── node_index/
    └── edge_index/
:::

List, set, and model types use a single FAISS directory at `index/`. After `he feed`, indexes may be stale; run `he build-index ./output/ --force` before searching or chatting on updated data.

## Troubleshooting

<AccordionGroup>
<Accordion title="Index not found">

```
Error: Index not found. Please run 'he build-index <ka_path>' first.
```

Run `he build-index ./output/`. If an index already exists but data changed, add `--force`.

</Accordion>

<Accordion title="No search results">

- Broaden the query or increase `-n`.
- Confirm data exists: `he info ./output/` should show `Nodes > 0`.
- Rebuild the index after feeding new documents.

</Accordion>

<Accordion title="Empty or missing visualization">

`he show` requires `data.json`. Check `he info` for zero nodes/edges—extraction may have failed or produced an empty graph.

</Accordion>

<Accordion title="Configuration errors">

Search, talk, and show call `validate_config()`. Run `he config init` and configure LLM and embedder providers. See [Troubleshooting](/troubleshooting) for API key and vLLM `base_url` issues.

</Accordion>

<Accordion title="Template resolution failures">

`he search`, `he talk`, and `he show` read `template` and `lang` from `metadata.json`. Missing or unknown templates raise load errors. Custom templates must have a `{template}.yaml` file in the KA directory.

</Accordion>
</AccordionGroup>

## Related pages

<CardGroup>
<Card title="Quickstart" href="/quickstart">
First extraction with `he parse`, then `he search` and `he show` on the Tesla biography example.
</Card>
<Card title="Knowledge Abstracts" href="/knowledge-abstracts">
On-disk KA layout (`data.json`, `metadata.json`, `index/`) and lifecycle methods.
</Card>
<Card title="CLI reference" href="/cli-reference">
Full `he search`, `he talk`, `he show`, and `he info` flag and exit contracts.
</Card>
<Card title="Python API reference" href="/python-api-reference">
`BaseAutoType.search`, `chat`, `show`, `load`, and `build_index` signatures.
</Card>
<Card title="Tesla biography recipe" href="/tesla-biography-recipe">
End-to-end parse → visualize → search → Q&A workflow.
</Card>
<Card title="Troubleshooting" href="/troubleshooting">
Missing index, empty results, and provider configuration failures.
</Card>
</CardGroup>
