RAG is Dead. Long Live PAR.

RAG is a search engine asked to be a memory system. That is a category error.

Retrieval-Augmented Generation assumes memory is a search problem: query comes in, find similar chunks, stuff them into context, generate. Search finds things. Memory understands how things relate, what supersedes what, and why you believe something in the first place.

RAG does not remember. It retrieves. And retrieval is not memory.

>The Failure Mode

Consider a financial advisor AI. A user discusses mortgage rates for their London flat. Twenty messages later, they pivot to refinancing a holiday home in Portugal. Ten messages after that: "What's my estimated monthly payment?"

RAG retrieves chunks about mortgages. Both mortgages. It blends context from two incompatible financial scenarios and produces a number that applies to neither.

Provenance-Aware Routing (PAR) handles this differently. When the user pivots to Portugal, drift detection fires and creates a second branch. The London flat discussion lives in one branch. The Portugal refinance lives in another. When the user asks "What's my estimated monthly payment?", PAR routes the question to the branch with the most recent activity (Portugal) and assembles context from that branch only. The answer uses Portuguese property values, Portuguese rates, Portuguese terms. No blending. No confusion.

This is not a finance edge case. A medical AI blends a patient's current symptoms with a resolved condition from six months ago. A product manager's assistant merges outdated requirements with the current spec. A legal review tool conflates contract terms from different negotiation rounds.

WARNING

This is not a retrieval failure. RAG retrieved exactly what it was designed to retrieve. The failure is architectural.

Search cannot distinguish between two mortgage discussions. Memory can, because memory tracks structure, not just similarity.

>The Three Questions

Every memory system must answer three questions. RAG cannot answer them natively. You can bolt on external state, layered schemas, or a controller, but then the retrieval layer is no longer your memory system. The bolted-on state is. RAG becomes a primitive inside something else.

Where does this go? When a new message arrives, where does it belong? Is it continuing the current thread? Starting a new one? Returning to something discussed earlier? RAG treats every query as a cold start. It has no concept of conversational position.

What facts does it depend on? A message about monthly payments depends on property price, interest rate, loan term, and down payment. But which property? Which rate discussion? Facts exist in branches. RAG flattens everything into one searchable pool with no structure.

Which messages justify that? When your system makes a claim, can you trace it back to the specific messages that established it? RAG returns chunks. It does not return lineage. You cannot audit a decision you cannot trace.

"RAG retrieves. PAR remembers."

>Provenance-Aware Routing

PAR inverts the architecture. Instead of retrieve-then-generate, PAR routes-then-assembles.

When a message arrives, PAR first detects semantic drift: is this the same topic, or has the conversation shifted? If drift exceeds a threshold, PAR creates a new branch. If the message returns to a previous topic, PAR routes it back to that branch. Routing is a structural decision, not a retrieval score. Routing decisions are first-class state transitions, not prompt-time inference.

Only after routing does PAR assemble context. It pulls from the relevant branch only, with full provenance. Every fact traces to the message that established it. Every inference traces to the facts that support it.

par-flow.ts

// PAR: Route first, assemble second
const routing = await detectDrift(message, currentBranch);

if (routing.action === 'BRANCH') {
// New topic detected, create branch
branch = await createBranch(routing.topic);
} else if (routing.action === 'ROUTE') {
// Returning to previous topic
branch = await routeToBranch(routing.targetBranch);
}

// Assemble context from the correct branch only
const context = await assembleContext(branch);

The result: instead of dumping 1,000 messages into context and hoping the model figures it out, PAR assembles a focused 20-message context with explicit fact dependencies and full lineage.

>Memory as Ledger

RAG systems handle contradictions by hoping the model resolves them. PAR handles contradictions through supersession.

When a user says "Actually, that deadline moved to March," PAR does not just detect drift. It identifies which deadline is being modified, marks that fact as superseded, and routes the message into the branch where that fact lives. The old deadline is not deleted. It is marked as superseded with a pointer to its replacement.

TECHNICAL

Supersession links enable rollback. If the user says "Wait, I was wrong, it's still February," you restore the original fact and supersede the supersession. Try doing that with a vector database.

This is the difference between memory as a ledger and memory as a lossy cache. A ledger preserves history. A cache overwrites it. Vector databases are caches. Conversation graphs with provenance are ledgers.

Every fact has a source. Every modification has a reason. Every decision is auditable.

>Hallucinated Continuity

Without provenance, memory systems hallucinate continuity.

They blend facts from incompatible branches. They lose track of what superseded what. They cannot explain why they believe something. They present a coherent-seeming narrative stitched together from fragments that were never meant to coexist.

RAG-based memory is a fuzzy knowledge blob. You query it, you get chunks, you hope for the best. When it fails, you cannot diagnose why. When it contradicts itself, you cannot trace the conflict. When an enterprise customer asks "Why did your system recommend X?", you have no answer.

PAR-based memory is a system you can trust under scrutiny. Branches are explicit. Facts have lineage. Conflicts are visible. You can answer the "why" question because every decision traces back through the graph.

>What This Unlocks

Provenance-aware routing enables capabilities that retrieval cannot provide.

Reversible memory. Supersession links mean you can roll back any fact extraction. Bad parse? Undo it. User corrected themselves? Restore the original.

Auditable decisions. Every recommendation traces through facts to source messages. Compliance teams can verify. Users can challenge. The system can explain itself.

Safe branch merging. When conversation threads reconverge, PAR knows what conflicts exist because conflicts are explicit. You merge with awareness, not hope.

Focused context windows. Instead of stuffing everything vaguely relevant into context, you assemble exactly what the current branch requires. Smaller windows, lower latency, lower cost, higher accuracy.

INSIGHT

DriftOS is our evolving implementation of PAR. Routing, branching, fact extraction, supersession. Production-ready TypeScript, open source. github.com/DriftOS/driftos

>Memory or Search

RAG was the right architecture for static documents and simple queries. The problem then was retrieval. The problem now is memory.

Memory requires structure. Memory requires provenance. Memory requires routing.

If your system cannot answer "why do you believe this?" with message-level evidence, it does not have memory. It has search with a longer context window. Full-history replay, stuffing everything into context and hoping the model figures it out, quietly destroys cost and correctness at scale.

RAG retrieves. PAR remembers.