Benchmarks, architecture writeups, and the occasional opinionated rant. Written by the people who built the software, for people who want to understand how it actually works.
A retrieval engine is only useful if it's where you work. Mnemosyne now ships as three PyPI packages: the engine, an MCP server for Claude Code and Cursor, and a zero-config Ollama bridge for fully local code search. Same engine, three delivery vectors, zero cloud required.
Most LLM code retrieval tools rely on one or two signal types. Mnemosyne fuses six — BM25, TF-IDF, symbol matching, usage frequency, predictive prefetch, and optional embeddings — through Reciprocal Rank Fusion. The architecture, the publication timeline, and a feature-by-feature comparison.
Same question, same codebase, same LLM — two search strategies. Mnemosyne semantic retrieval was 2.4x faster, 5.6x fewer tokens, and 4.2x cheaper with equivalent answer quality. A head-to-head benchmark with full cost analysis.
We built Mnemosyne to stop burning tokens on irrelevant code. A real benchmark against an 844-file production codebase shows 74% token reduction, 99% faster queries, and equivalent answer quality. Zero dependencies. Apache-2.0.
Regulations define what companies must do. Privacy policies describe what companies say they'll do. This case study describes how we built RuleBrief and PrivacyPeep to address both sides of that gap — and the shared design principles behind explainability, local-first processing, and published methodology.