88% first-result precision - validated, measured, and production-ready. Every number on this page is evidence-based and cites published research or a reproducible benchmark. Zero hallucinated metrics.
Recallium's results plotted against the cited industry-standard range for each information-retrieval metric. Bars show measured performance; shaded zones mark the published baseline.
Measured across 30 queries over 76 memories in 5 interconnected projects. Baselines cited in Industry Standards & Citations.
Precision@5 broken down across the five query categories in the evaluation set.
First-result precision (P@1) of competitive tiers, derived from aggregated enterprise-search studies. Recallium sits above the top-tier platform band.
Tier boundaries aggregated from enterprise-search studies1, commercial systems, and top-tier platforms2. Full citations below.
Evaluation follows standard information-retrieval paradigms with an LLM judge producing ~300 relevance judgments.
76 memories across 5 interconnected projects simulating real-world technical documentation.
30 diverse queries spanning exact-match, semantic, cross-project, hybrid, and ambiguous categories.
Standard IR metrics: Precision@1/5/10, Recall@5/10, MRR, NDCG, Coverage, and Latency.
Claude Sonnet 4.5 as intelligent evaluator producing ~300 relevance judgments.
Vector-only systems (e.g. mem0, Supermemory) rely on semantic similarity alone, missing exact matches and technical terminology. Recallium fuses semantic + keyword + file-based retrieval.
Vector-only midpoint of cited 70-80%1; RAG baseline 57.6%5.
All performance comparisons reference published research or commercial benchmarks.
Reproduce the benchmark on your own corpus - the eval harness ships with the open-source repo.
Recallium Search Benchmark Report · November 2025 · Test Dataset v1.0 · All comparisons evidence-based