The memory firewall¶
Most memory systems describe a stored fact with one collapsed status: it's "active" or it isn't. That single axis can't express the states that actually matter for trust. A belief can be true but secret. It can be confidently held but contradicted by newer evidence. It can be clean but stale. Collapse those into one enum and you lose the very distinctions you need to govern memory safely.
Lodestar's memory firewall describes every belief with four orthogonal lifecycle axes, governs the transitions between their states, and gates what is allowed to reach the model.
The four axes¶
| Axis | States | Question it answers |
|---|---|---|
truth_status |
unverified · supported · contradicted · superseded |
Is it true? |
retrieval_status |
hidden · restricted · normal · privileged_only · blocked |
May it be retrieved? |
security_status |
clean · suspicious · quarantined · malicious |
Is it safe? |
freshness_status |
fresh · stale · expired |
Is it current? |
They are deliberately independent. A poisoned belief can be set
security_status: quarantined without touching its truth_status —
quarantine blocks it from retrieval while preserving the full audit trail of why
the system once thought it was safe. A belief can go truth_status: contradicted
while staying retrieval_status: normal, so the contradiction surfaces in context
rather than vanishing. Separate axes mean separate, recorded decisions.
Each axis has its own transition table; a belief never jumps states silently. And
a fifth attribute, sensitivity (public / internal / confidential /
secret), travels alongside the axes — describing the content of a belief, not
its state. A belief can be true, fresh, clean, and supported, and still be
secret.
No self-promotion¶
The load-bearing rule: an agent's own success does not promote a memory.
This is what defeats MemoryGraft-style attacks, where an adversary plants a
"successful experience" hoping the agent will later imitate it. Lodestar's firewall
does not move a candidate belief to supported from a single apparent success.
Promotion requires user confirmation, independent corroboration, probe
verification, or a narrow auto-promotion policy with logged evidence — never the
agent vouching for itself. It pairs directly with the
auto-observation gate: content read from the outside
can't promote itself, and neither can the agent that read it.
The guard-import-no-self-promote probe pins this invariant.
The context policy gates retrieval¶
Storing a belief safely is only half the job; the other half is governing what
reaches the model. The cognitive core consults a ContextPolicy every time it
assembles context — for claim extraction, planning, decisions, or explanations.
Its conservative v0 default:
- only
truth_status: supportedbeliefs load, - only
retrieval_status: normalbeliefs load, - only
security_status: cleanbeliefs load, - a freshness ceiling (
P30D) and asensitivity_ceiling(internal) apply, - contradictions and uncertainties are surfaced, not silently dropped,
- decisions are required to cite their belief dependencies.
A quarantined or suspicious belief therefore cannot influence the planner —
regardless of how high its stated confidence was when it was created. In LLM
systems, what enters context often matters more than what is stored; the context
policy makes "the planner used a stale belief" or "the explanation leaked a secret"
into testable invariants rather than invisible bugs. The
quarantined-not-retrievable and sensitivity-ceiling probes hold the line.
Contradiction routing¶
When new evidence contradicts an existing belief, the contradiction must not just
disappear behind the retrieval filter. The firewall routes related contradicted
beliefs through a dedicated channel (retrieveContradictions) so the planner can
see the conflict rather than silently losing the losing side. Across a session,
a contradicted belief also flags the decisions that depended on it — the
contradicted-belief-flags-dependent-decisions probe checks that the cascade
reaches the right places.
Plugs in front of your memory layer¶
The firewall is a horizontal component: it governs which beliefs are adopted, retrieved, quarantined, or blocked — it does not replace your memory store. Adapters exist for mem0, Letta, and Zep, so a team can keep its existing memory layer and put a firewall in front of it. State persists through in-memory and Postgres backends behind one interface, so belief/claim/evidence state can be shared across sessions.
Related¶
- The epistemic chain — where beliefs come from.
- Reading isn't believing — the auto-observation gate that pairs with no-self-promotion.
- Threat model: memory poisoning — the attacks the firewall is designed against.