NNunchi AI
← 블로그로 돌아가기

96.6% Recall, 0% Portability

MemPalace hit 41k stars in a week. That's not hype — that's pain. Developers are desperate to stop losing their conversations with AI. Every debugging session, every architecture decision, every "we tried X and it failed because Y" — gone when the session ends.

MemPalace promises to fix this: store everything locally, search it with ChromaDB, get 96.6% recall on LongMemEval. No API key, no cloud, no cost. And the score is real — independently reproduced.

I genuinely respect the project. The team was honest enough to publicly retract overclaimed numbers within 48 hours of launch. That's rare in this space.

But after reading the code and the benchmarks, I think MemPalace — along with Mem0, Zep, and every other memory system out there — is solving the wrong layer of the problem.

What 96.6% Actually Means

MemPalace's headline score comes from raw verbatim mode: dump entire conversations into ChromaDB, run semantic similarity search, return the top 5 results. No summarization, no extraction, no LLM in the loop.

This is a brute-force approach, and it works beautifully for retrieval benchmarks. If you stored the exact conversation where you discussed GraphQL migration, and someone asks "why did we switch to GraphQL," ChromaDB will find it. The original words are there. Similarity search picks them up.

The palace metaphor — wings, halls, rooms, closets, drawers — is elegant marketing for what is technically hierarchical metadata filtering on ChromaDB. Their own benchmark shows the breakdown: unfiltered search hits 60.9%, adding wing filtering gets 73.1%, wing + room gets 94.8%. This is useful. It's also a standard feature of any vector database with metadata fields. The "+34% palace boost" the team originally claimed was later corrected to be exactly this.

None of this is criticism. Raw verbatim storage with structured metadata is a legitimate design choice that optimizes for one thing: finding past conversations. And it does that well.

The question is whether finding past conversations is enough.

Retrieval Is Not Memory

Here's a thought experiment. You've been using MemPalace for six months. You have 20,000 conversation fragments stored across 15 project wings. Now:

Your colleague asks: "What's our current auth strategy?" You search. MemPalace returns five fragments — one from January where you chose Auth0, one from March where you discussed migrating to Clerk, one from April where you actually migrated, and two from May about edge cases in the new Clerk setup. Which one represents your current state? MemPalace doesn't know. It returns all five ranked by similarity, not by temporal validity or supersession. You — or your AI — have to read all five and figure out the timeline.

Now try: "Has our position on database choice changed since we started?" This requires understanding that an atom of knowledge ("we use Postgres") was created, then potentially updated or contradicted, and tracing that evolution. Raw verbatim search returns fragments containing the word "database." Reconstructing the narrative is left as an exercise for the reader.

This is the difference between retrieval and memory. Retrieval finds text. Memory understands state — what's current, what's superseded, what contradicts what, how knowledge evolves over time.

MemPalace acknowledges this gap. They built a contradiction detection utility (fact_checker.py) and a temporal knowledge graph. But as of their April 2026 correction, the contradiction detector isn't wired into the knowledge graph operations. It exists as a separate script. This isn't a bug to be fixed — it's a symptom of an architecture that treats structured understanding as an add-on to raw storage, rather than the foundation.

The Portability Problem Nobody Talks About

Here's the problem that concerns me more than retrieval quality.

MemPalace stores everything in ChromaDB + SQLite on your local machine. Their own format. No export standard. No migration path.

Today you use MemPalace with Claude Code. Six months of conversations, carefully mined and organized into wings and rooms. Now:

You switch to Cursor. Can you bring your memory? Only if Cursor has a MemPalace MCP integration, and the MCP server is running, and it points to the same ChromaDB instance. Your memory isn't portable — it's accessible from a specific tool through a specific runtime.

You want to use memory on your phone. MemPalace is a local Python package with ChromaDB. There's no hosted option, no sync, no mobile path.

Your team wants shared memory. MemPalace is single-user, single-machine. There's no multi-user story.

You want to try Mem0 or Zep instead. Your 20,000 fragments are in MemPalace's ChromaDB schema. There's no standard format to export them in that another system could import.

This isn't a MemPalace-specific problem. It's an industry-wide pattern. Every memory system invents its own storage format, its own schema, its own retrieval logic. Switch systems, lose your memory. This is the exact same lock-in that Harrison Chase warned about with model providers — except now it's memory providers.

SystemStoragePortable?Standard format?
MemPalaceChromaDB + SQLiteNoNo
Mem0Proprietary cloudNoNo
ZepNeo4j cloudNoNo
LettaEmbedded in harnessNoNo
Claude MemoryBehind APINoNo
ChatGPT MemoryBehind APINoNo

Every row is a silo. Your memories are hostage to your tool choice.

What Memory Actually Needs

If we step back and ask what a real memory system requires, retrieval is just one capability on a longer list.

Atomization: breaking conversations into discrete knowledge units — facts, decisions, preferences, relationships — that can be individually tracked, updated, and superseded. Raw verbatim storage skips this entirely.

Temporal awareness: knowing that "we use Auth0" from January is superseded by "we migrated to Clerk" from April, without requiring the user or AI to read both fragments and infer the timeline.

Portability: memory that moves with you across tools, devices, and platforms. Not locked to one vector database on one machine.

Scope management: distinguishing between project-level knowledge, personal preferences, and session-specific context. Not everything belongs in every query.

Decay and relevance: some knowledge loses value over time. Implementation details from a refactored codebase shouldn't compete with current architecture decisions in retrieval ranking.

Protocol-level interoperability: a standard contract that any harness, any tool, any backend can speak — so memory becomes infrastructure, not a feature of one product.

MemPalace nails the first requirement of any memory system: don't lose the data. But the six capabilities above are what turn stored data into working memory.

AMCP: Memory as a Protocol

This is why I've been building AMCP — Agent Memory Continuity Protocol. It's an open standard, Apache 2.0, positioned as the third layer of agent infrastructure alongside MCP (tools) and A2A (agent collaboration).

The core idea: memory should be a contract, not a product. Any harness should be able to store and recall memory through a standard interface. Any backend should be able to implement that interface. Users own their memory regardless of which tools they use.

AMCP defines:

  • A standard atom shape for knowledge units
  • Scope and visibility controls (project, user, session)
  • Recall with similarity search and scope filtering
  • Export/import in a portable format
  • Backend-agnostic transport (local SQLite, hosted cloud, third-party)

The spec is at github.com/goldberg-aria/amcp.

3122: What Harness Integration Actually Looks Like

A protocol is just a spec without a working implementation. 3122 is an AMCP-native coding harness that proves the architecture works.

3122 makes an explicit split between two types of state:

Continuity Runtime State stays in the harness. Session transcripts, trajectory compression (goal → attempt → failure → verification → next-step), handoff snapshots when you switch models, file-level working memory. These are operational concerns that belong to the harness.

Portable Memory goes through AMCP. One item shape. Backends are swappable: local-amcp for SQLite on your machine, nexus-cloud for a hosted backend, third-party-amcp for anything else that speaks the protocol.

The depth of integration is where it gets interesting. 3122 doesn't just call AMCP as a search API. It uses trajectory-aware recall — prioritizing atoms related to the files you're actively working on and the errors you've hit. It manages context budgets — degrading conversation recall first, then older memory, then recent history, as the session grows. It emits handoff snapshots when you switch models, giving the new model a context boost on its first turn.

All of this is harness logic. The harness decides what to include. AMCP provides what to choose from. Clean separation, deep integration.

And the portability story:

# Export everything
3122 memory export --format amcp-jsonl --output my-knowledge.jsonl

# Switch backends
3122 memory migrate --from local-amcp --to nexus-cloud

# Delete the workspace, rebuild, reconnect
# Your memory is intact.

The repo is at github.com/goldberg-aria/3122-harness.

Raw Storage Has a Place

I want to be clear: I'm not arguing that MemPalace's approach is wrong. Raw verbatim storage with semantic search is a valid tool. For a solo developer who wants to search past Claude conversations on their laptop, it's excellent.

But the problem space is bigger than local search. Teams need shared memory. Workflows need structured knowledge. Users need memory that survives tool changes. The industry needs a standard so we stop building silos.

MemPalace's 96.6% recall proves that the retrieval problem is solvable. What it doesn't address — and what no current system properly addresses — is the portability and structure problem.

That's what AMCP and 3122 are for.

The Landscape

We're in the early innings of AI memory. Every project in this space — MemPalace, Mem0, Zep, Letta, and yes, Nexus and AMCP — is experimenting with different tradeoffs. The fact that MemPalace got 41k stars in a week tells us the demand is real and massive.

My bet is that the winning architecture separates three concerns:

1. The harness owns context management — what goes in the prompt, how to budget tokens, when to compact 2. The memory layer owns persistence and recall — storing atoms, managing temporal validity, handling scope 3. The protocol connects them — so neither the harness nor the backend creates lock-in

MCP did this for tools. AMCP aims to do it for memory.

Your conversations shouldn't die when the session ends. But they also shouldn't be trapped in one tool forever.

---

AMCP spec: [github.com/goldberg-aria/amcp](https://github.com/goldberg-aria/amcp) 3122 harness: [github.com/goldberg-aria/3122-harness](https://github.com/goldberg-aria/3122-harness) Post 1 in this series: [Your Harness Will Change. Your Memory Shouldn't.](#)