# Search, chat, and visualize > Query Knowledge Abstracts with `he search` and `he talk` (single query or `-i` interactive mode), inspect stats via `he info`, and render graphs through OntoSight with `he show` or `AutoType.show()`. - Repository: yifanfeng97/Hyper-Extract - GitHub: https://github.com/yifanfeng97/Hyper-Extract - Human docs: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf - Complete Markdown: https://grok-wiki.com/public/docs/yifanfeng97-hyper-extract-7891c7254cdf/llms-full.txt ## Source Files - `hyperextract/cli/cli.py` - `hyperextract/types/base.py` - `hyperextract/types/graph.py` - `hyperextract/cli/utils.py` - `hyperextract/types/hypergraph.py` --- --- title: "Search, chat, and visualize" description: "Query Knowledge Abstracts with `he search` and `he talk` (single query or `-i` interactive mode), inspect stats via `he info`, and render graphs through OntoSight with `he show` or `AutoType.show()`." --- Hyper-Extract exposes four exploration commands on a Knowledge Abstract (KA) directory: `he info` for metadata and counts, `he search` and `he talk` for semantic retrieval and Q&A over the vector index, and `he show` for OntoSight visualization. The Python SDK mirrors the same surface through `BaseAutoType.search()`, `chat()`, and `show()` on template-backed AutoType instances. `he search` and `he talk` require a populated `index/` directory. `he show` and `he info` only require `data.json`. LLM and embedder configuration is validated for search, talk, and show, but not for `he info`. ## Prerequisites Before querying or chatting, ensure the KA is ready: Run `he parse` (or `he feed` to append) so the output directory contains `data.json` and `metadata.json`. See [Extract and evolve](/extract-and-evolve). `he parse` builds the index by default. If you used `--no-index`, or appended data with `he feed`, rebuild: ```bash he build-index ./output/ ``` Search and talk use the embedder for retrieval and the LLM for chat answers. Initialize configuration with `he config init` or set environment variables. See [Configure providers](/configure-providers). ```bash he info ./output/ ``` Confirm `Nodes` / `Edges` are non-zero and `Index` shows `Built`. ## Inspect with `he info` `he info` prints KA metadata and statistics without loading the full AutoType or calling an LLM. ```bash he info ./output/ ``` ```text Knowledge Abstract Info Path ./output/ Template general/biography_graph Language en Created 2024-01-15 10:30:00 Updated 2024-01-15 10:35:22 Nodes 25 Edges 32 Index Built ``` | Field | Meaning | |-------|---------| | `Template` | Preset or custom template ID from `metadata.json` | | `Language` | Processing language (`en` or `zh`) | | `Nodes` | Entity/item count from `data.json` (`nodes`, `entities`, or list length) | | `Edges` | Relationship count (`edges`, `relations`, or `0` for non-graph types) | | `Index` | `Built` when `index/` exists and is non-empty; otherwise `Not Built` | Use `he info` to confirm extraction succeeded, monitor growth after `he feed`, and check whether `he build-index` is needed before search or talk. ## Semantic search with `he search` `he search` embeds the query, runs similarity search against the FAISS index, and prints ranked structured results as JSON. ```bash he search ./output/ "Tesla's inventions" he search ./output/ "electrical engineering" -n 10 ``` Path to the KA directory. Natural-language or keyword search string. Number of results. Short form: `-n`. ### Retrieval pipeline ```mermaid sequenceDiagram participant CLI as he search participant KA as AutoType instance participant IDX as FAISS index participant EMB as Embedder CLI->>KA: Template.create + load(ka_path) CLI->>KA: search(query, top_k) KA->>EMB: embed query KA->>IDX: similarity_search IDX-->>KA: ranked documents KA-->>CLI: structured items CLI-->>CLI: print JSON results ``` For graph and hypergraph AutoTypes, `search()` returns a tuple `(nodes, edges)`. The CLI enumerates that tuple, so output typically shows a node group and an edge group rather than flat numbered entities. For list, set, and model AutoTypes, `search()` returns a flat list of Pydantic items—one JSON object per result. Use natural-language queries (`"What were the major achievements?"`) rather than bare keywords. Increase `-n` when results feel too narrow. ## Chat with `he talk` `he talk` retrieves context with the same vector index, then calls the configured LLM to synthesize an answer. Single-query and interactive modes are supported. ```bash he talk ./output/ -q "What were Tesla's major achievements?" he talk ./output/ -q "Explain the War of Currents" -n 10 ``` Prints the answer to stdout. When the LLM response includes retrieved context, the CLI shows truncated `Retrieved context` lines from `response.additional_kwargs["retrieved_items"]`. ```bash he talk ./output/ -i ``` Starts a REPL. Type questions at the `>` prompt. Exit with `exit`, `quit`, or `q`. `Ctrl+C` also exits cleanly. Question for single-query mode. Short form: `-q`. Required unless `--interactive` is set. Enter interactive chat loop. Short form: `-i`. Number of context items retrieved before LLM generation. Short form: `-n`. ### Chat pipeline `BaseAutoType.chat()` performs retrieval → context formatting → LLM invocation: 1. `search(query, top_k)` fetches relevant items. 2. Items are serialized to JSON (or plain text for string results) and joined into a context block. 3. A QA prompt asks the LLM to answer from that context. 4. The returned `AIMessage` includes `content` and `additional_kwargs["retrieved_items"]`. Graph and hypergraph types override `chat()` to retrieve nodes and edges separately, format them under `=== Relevant Nodes ===` and `=== Relevant Edges ===` headers, and attach `retrieved_nodes` / `retrieved_edges` in metadata. `he talk` requires either `-q` or `-i`. Running `he talk ./output/` without either exits with an error. ## Visualize with `he show` `he show` loads the KA, resolves the template from `metadata.json`, and opens an OntoSight viewer in the default browser. ```bash he show ./output/ ``` Visualization works for all eight AutoTypes. Graph-based types (`AutoGraph`, `AutoHypergraph`, `AutoTemporalGraph`, `AutoSpatialGraph`, `AutoSpatioTemporalGraph`) render nodes and edges. `AutoList`, `AutoSet`, and `AutoModel` use list, set, and structured views respectively. When both node and edge indices exist (graph types) or a single FAISS index exists (list/set/model), OntoSight wires **search** and **chat** callbacks into the viewer so you can query from the UI. Without indices, visualization is read-only. | AutoType | OntoSight viewer | In-viewer search/chat | |----------|------------------|----------------------| | `AutoGraph` | `view_graph` | When `node_index` and `edge_index` exist | | `AutoHypergraph` | `view_hypergraph` | When both indices exist | | `AutoList` / `AutoSet` | List/set view | When FAISS index exists | | `AutoModel` | Structured view | When index exists | If the browser does not open automatically, check the terminal for the localhost URL and open it manually. ## Python API equivalents The CLI commands delegate to `Template.create(template, lang)` → `load(ka_path)` → AutoType methods. ```python Python — search and chat from hyperextract import Template ka = Template.create("general/biography_graph", language="en") ka.load("./output/") ka.build_index() # skip if index already on disk # Graph types return (nodes, edges) nodes, edges = ka.search("AC power system", top_k=3) response = ka.chat("Who was Nikola Tesla?", top_k=3) print(response.content) print(response.additional_kwargs.get("retrieved_nodes", [])) ``` ```python Python — visualize ka.show( node_label_extractor=lambda n: n.name, edge_label_extractor=lambda e: e.type, ) ``` ```python Python — in-memory workflow from hyperextract.types import AutoGraph graph.feed_text(text) graph.build_index() for q in questions: print(graph.chat(q).content) graph.show() ``` `AutoType.show()` accepts optional label extractors and `top_k_*_for_search` / `top_k_*_for_chat` kwargs on graph types to control OntoSight callback retrieval depth. ## Search vs talk | | `he search` | `he talk` | |---|-------------|-----------| | Output | Raw structured items (JSON) | Natural-language answer | | LLM call | No (embedder only) | Yes | | Speed | Faster | Slower | | Best for | Locating specific entities/relations | Explanations, summaries, follow-up Q&A | | Index required | Yes | Yes | A typical workflow: `he search` to locate relevant nodes or edges, then `he talk -q` for a synthesized explanation, then `he show` to inspect structure visually. ## Index layout Graph and hypergraph KAs store separate FAISS indices under `index/`: :::files output/ ├── data.json ├── metadata.json └── index/ ├── node_index/ └── edge_index/ ::: List, set, and model types use a single FAISS directory at `index/`. After `he feed`, indexes may be stale; run `he build-index ./output/ --force` before searching or chatting on updated data. ## Troubleshooting ``` Error: Index not found. Please run 'he build-index ' first. ``` Run `he build-index ./output/`. If an index already exists but data changed, add `--force`. - Broaden the query or increase `-n`. - Confirm data exists: `he info ./output/` should show `Nodes > 0`. - Rebuild the index after feeding new documents. `he show` requires `data.json`. Check `he info` for zero nodes/edges—extraction may have failed or produced an empty graph. Search, talk, and show call `validate_config()`. Run `he config init` and configure LLM and embedder providers. See [Troubleshooting](/troubleshooting) for API key and vLLM `base_url` issues. `he search`, `he talk`, and `he show` read `template` and `lang` from `metadata.json`. Missing or unknown templates raise load errors. Custom templates must have a `{template}.yaml` file in the KA directory. ## Related pages First extraction with `he parse`, then `he search` and `he show` on the Tesla biography example. On-disk KA layout (`data.json`, `metadata.json`, `index/`) and lifecycle methods. Full `he search`, `he talk`, `he show`, and `he info` flag and exit contracts. `BaseAutoType.search`, `chat`, `show`, `load`, and `build_index` signatures. End-to-end parse → visualize → search → Q&A workflow. Missing index, empty results, and provider configuration failures.