# LongMemEval-V2 Plain-Language Wiki > LongMemEval-V2 is a benchmark that tests whether an AI agent's memory system can turn long histories of web-browsing actions into the kind of practical knowledge a seasoned colleague would have. The repo ships the dataset pipeline, a pluggable memory framework, an evaluation harness, and leaderboard packaging utilities. This is a Grok-Wiki source-grounded repository wiki. Use the complete Markdown link when an agent needs the full repo context. ## Context Links - [Complete Markdown wiki](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/llms-full.txt) - [Complete Markdown alias](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2.md) - [Human interactive wiki](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2) - [GitHub repository](https://github.com/xiaowu0162/LongMemEval-V2) ## Repository - Repository: xiaowu0162/LongMemEval-V2 - Generated: 2026-05-22T06:16:36.043Z - Updated: 2026-05-22T06:49:07.550Z - Runtime: Claude Code - Format: Explain Like I'm 5 - Pages: 6 ## Pages - [Explain It Simply: What This Repo Does](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/01-explain-it-simply-what-this-repo-does.md): What LongMemEval-V2 is in plain language, the one analogy to keep, and the three ideas every reader should hold onto before going deeper. - [Five Things a Good Memory Must Know](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/02-five-things-a-good-memory-must-know.md): The five memory abilities the benchmark tests — static recall, dynamic tracking, workflow knowledge, gotchas, and premise awareness — explained with real question categories from the harness source code. - [Downloading & Preparing the Haystack](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/03-downloading-preparing-the-haystack.md): How trajectory data moves from Hugging Face through download, screenshot extraction, and symlink preparation into the form the harness expects — covering the three data scripts and the validate step. - [The Six Memory Backends: How Each One Works](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/04-the-six-memory-backends-how-each-one-works.md): A plain-English tour of the six pluggable memory backends — no_retrieval, RAG variants, AgentRunbook-R, Codex, and AgentRunbook-C — explaining what each one stores and retrieves, plus the insert/query contract every custom backend must satisfy. - [The Evaluation Harness: From Question to Score](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/05-the-evaluation-harness-from-question-to-score.md): How harness.py feeds each question to a memory backend, collects context items, calls the reader model, and scores the answer — including the LLM judge paths for abstention and gotchas questions, and how shell scripts wire it all together. - [Scoring, LAFS, & What to Remember](https://grok-wiki.com/public/wiki/xiaowu0162-longmemeval-v2-0193366cbab2/pages/06-scoring-lafs-what-to-remember.md): How web and enterprise run results are merged, how LAFS turns accuracy and latency into a single leaderboard score, the two-step submission packaging process, and a plain-English recap of the core ideas to carry away from this repo. ## Source Files - `data/download_data.py` - `data/prepare_data.py` - `data/public_data.py` - `data/validate_data.py` - `environment.yml` - `evaluation/harness.py` - `evaluation/memory_configs/no_retrieval.json` - `evaluation/memory_configs/rag_query_to_slice.json` - `evaluation/qa_eval_metrics.py` - `evaluation/run_eval.py` - `evaluation/scripts/run_no_retrieval.sh` - `evaluation/scripts/run_rag_query_to_slice.sh` - `leaderboard/build_submission_step_1_single_operating_point.py` - `leaderboard/build_submission_step_2_build_package.py` - `leaderboard/combine_aggregated_metrics.py` - `leaderboard/compute_lafs.py` - `leaderboard/README.md` - `leaderboard/submission_utils.py` - `memory_modules/agentrunbook_c.py` - `memory_modules/agentrunbook_r.py` - `memory_modules/codex.py` - `memory_modules/memory.py` - `memory_modules/no_retrieval.py` - `memory_modules/rag.py` - `memory_modules/support.py` - `memory_modules/trajectory_store.py` - `pyproject.toml` - `README.md`