Wrap your own agent: guard.wrap(), the MCP proxy, policies, and probes¶
Part 1 showed the demos. Part 2 made the case to your security team. This part is the wiring guide: putting Lodestar around your agent, today, with snippets that run as written.
Everything below works from a clone with Bun:
git clone https://github.com/qmilab/lodestar
cd lodestar
bun install
(The packages are also on npm as @qmilab/lodestar-* at v0.2.0 — the clone
path is used here because it gives you the examples and probes to crib from,
and every bun run lodestar … command below resolves through the clone
root's lodestar script. In your own project instead:
bun add @qmilab/lodestar-guard @qmilab/lodestar-harness
@qmilab/lodestar-adapter-filesystem covers the library-path imports, and
bun add @qmilab/lodestar-cli gives you the same CLI as bunx lodestar ….)
TL;DR — Two ways in. Own the loop?
guard.wrap()is a function call around your agent code. Don't own the agent (Claude Code, Cursor, Aider)?lodestar guard mcp-proxysits between it and its MCP tools — no agent changes, but you must deny the agent's built-in tools so the governed path is the only path. Either way you grade your tools on the trust ladder, set an auto-approve ceiling (L4 cannot be auto-approved — the floor is not negotiable), resolve held actions withlodestar approvefrom a second terminal, and lock the invariant you care about with a probe that fails CI if it ever regresses.
Two ways in¶
The decision is one question: do you own the agent's loop?
- You wrote the loop (a homegrown agent, a script that calls an LLM with
tools): use
guard.wrap(). It's a library call; your loop runs inside a governed context and every tool call flows through the kernel. - You don't own the agent (Claude Code, Cursor, Aider — anything that
speaks MCP): use the MCP proxy. The agent's MCP config points at the
proxy; the proxy owns the real downstream servers and governs every
tools/callin between. No changes to the agent.
The two paths produce the same thing: an append-only event log you can render
with lodestar report <session-id>, with the same policy gate and the same
memory-firewall semantics from part 2.
Path 1: the greenfield loop — guard.wrap()¶
Here is the minimal real shape, as used by the
coding-agent-greenfield
example:
import {
wrap,
autoApprovePolicy,
alwaysHoldsChecker,
type GuardContext,
} from "@qmilab/lodestar-guard"
import { registerFsReadTool } from "@qmilab/lodestar-adapter-filesystem"
// Tools are registered with the Action Kernel up front — the adapters ship
// graded contracts (fs.read arrives as an L0, read-only contract).
registerFsReadTool(process.cwd())
// Your agent loop, unchanged except that tool calls go through ctx.
const agentLoop = async (ctx: GuardContext) => {
// Every tool call routes through the kernel: propose → gate → execute.
await ctx.callTool("fs.read", { path: "README.md" }, { intent: "read the README" })
// ctx.ingestObservation(...) — feed an observation through the Cognitive Core
// ctx.recordDecision(...) — declare a decision + the beliefs it depends on
// ctx.emit(...) — debug events (recorded, but never trusted for gating)
return { done: true }
}
const run = wrap(agentLoop)
const { result, session_id, log_root } = await run({
project_id: "my-agent",
actor_id: "agent:mine",
default_scope: { level: "project", identifier: "my-agent" },
default_sensitivity: "internal",
policy_gate: autoApprovePolicy({
// L0–L2 sail through; L3 falls to the structural deny default;
// L4 always holds, L5 always denies.
auto_approve_up_to: 2,
approver_id: "policy-decider",
}),
precondition_checker: alwaysHoldsChecker,
})
console.log(`report: bun run lodestar report ${session_id}`)
Things to notice, because they're deliberate:
- There are no silent defaults for the security-relevant settings. Scope,
sensitivity, the policy gate, the precondition checker — all explicit. If
your policy can hold an action, you must also supply an
approval_resolver; a hold with no resolver is a hard error atcallTooltime, not a silent auto-deny. (The snippet above registers onlyfs.read— an L0 tool — so nothing in it can hold; the moment you register an L4 tool, supply a resolver. The approvals section below shows how.) ctx.emit()is honest about its own trust level. Raw agent emits are recorded for debugging but are not trusted to drive sentinel arbitration (the runtime monitors from part 2 that can flag a suspicious belief and hold actions depending on it) — an agent cannot talk its way past a belief-scoped alert by emitting noise. The trusted way to declare "this plan rests on these beliefs" isctx.recordDecision().- The result hands you the session.
session_idis exactly whatlodestar reporttakes;log_rootis where the NDJSON event log landed (default.lodestar/events).
The extension seam: teaching the guard about your evidence¶
The documentation-agent
example (part 1's warm-up demo) adds exactly one thing to the config above — a
custom evidence linker, through the cognitive seam:
import { DocAwareEvidenceLinker } from "@qmilab/lodestar-cognitive-core"
const { result, session_id } = await run({
// ... same required fields as above ...
policy_gate: autoApprovePolicy({ auto_approve_up_to: 2, approver_id: "doc-agent-policy" }),
precondition_checker: alwaysHoldsChecker,
cognitive: {
evidenceLinkerFactory: ({ evidence, beliefs }) =>
new DocAwareEvidenceLinker(evidence, beliefs),
},
})
That one factory is what makes file content land as external_document
evidence (stamped with its source file, kept unverified) instead of being
quietly trusted — and it's the exact seam the
documentation-evidence-provenance probe from part 2 pins in CI. The same
seam takes an MCP-aware linker, or your own: anything that decides what
quality of evidence a claim's source amounts to.
If you want durable, cross-session state, the sibling stores seam injects
Postgres-backed claim/belief/evidence stores — that's the seam the
tool-poisoning-cross-session
probe rides, and its source doubles as the wiring example.
Path 2: the agent you don't own — the MCP proxy¶
The proxy is a config file plus one command. Here's the shape of the real
config that drove the live Claude Code run in part 1
(real-claude-code/proxy.config.json,
trimmed):
{
"project_id": "telenotes-governed-dev-claude-code",
"actor_id": "agent:claude-code",
"session_id": "auto",
"log_root": ".lodestar/events",
"default_scope": { "level": "project", "identifier": "telenotes-governed-dev-claude-code" },
"default_sensitivity": "internal",
"auto_approve_ceiling": 3,
"downstream_servers": [
{
"name": "fs",
"command": "bunx",
"args": ["@modelcontextprotocol/server-filesystem", "/absolute/path/to/workspace"]
},
{
"name": "devtools",
"command": "bun",
"args": ["run", "/path/to/dev-tools-mcp/bin.ts", "/absolute/path/to/workspace"]
}
],
"tool_defaults": {
"mcp.fs.read_text_file": {
"reversibility": "reversible",
"permissions": ["fs.read"],
"sandbox": "read",
"required_trust_level": 0,
"blast_radius": "self"
},
"mcp.fs.write_file": {
"reversibility": "compensable",
"permissions": ["fs.write"],
"sandbox": "write-local",
"required_trust_level": 3,
"blast_radius": "project"
},
"mcp.devtools.git_push": {
"reversibility": "irreversible",
"permissions": ["network.egress"],
"sandbox": "controlled-shell",
"required_trust_level": 4,
"blast_radius": "external"
}
}
}
(The placeholder paths: in the clone, the dev-tools server lives at
examples/telenotes-governed-dev/dev-tools-mcp/bin.ts; the fs server needs
no clone at all — bunx fetches it.)
Read it top to bottom and you've read the governance story:
downstream_servers— the proxy spawns these as child processes and re-exposes their tools upstream under namespaced names (mcp.<server>.<tool>). The agent sees one MCP server; the proxy sees everything.tool_defaults— you grade each tool: trust level, blast radius, reversibility, sandbox profile. The proxy deliberately ignores MCP annotations as a trust source — per the MCP spec they're untrusted unless the server is — so the grading is operator-authored, not taken from the wire. Any tool you didn't enumerate falls to a conservative default (L3, irreversible, controlled-shell), which you'll see in your report as a nudge to grade it explicitly.auto_approve_ceiling: 3— the gate from part 1: L0–L3 auto-approve, the L4git_pushis rejected (or held, once you add approvals — next section).
Start it, point your agent at it:
bun run lodestar guard mcp-proxy --config ./proxy.config.json
For Claude Code, the project's .mcp.json declares the proxy as the one MCP
server (the committed example):
{
"mcpServers": {
"lodestar": {
"command": "bun",
"args": [
"run", "/path/to/lodestar/packages/cli/src/index.ts",
"guard", "mcp-proxy",
"--config", "/path/to/proxy.config.json"
]
}
}
}
The caveat that matters: deny the built-ins¶
A real coding agent ships its own file and shell tools, and those never touch MCP — if they stay enabled, the agent edits files directly and your trust report comes back empty of write actions. The proxy can only govern what flows through it. So launch the agent with built-ins denied and the proxy allowed, exactly as part 1's live run did:
claude -p "<your task>" \
--mcp-config .mcp.json --strict-mcp-config \
--disallowedTools Edit Write MultiEdit NotebookEdit Bash Read Glob Grep LS WebFetch WebSearch \
--allowedTools "mcp__lodestar__*" \
--output-format text
This is stated as a caveat in the repo's own recipe, and in part 2's honest-limits list, because it's the kind of thing a wrapper should say out loud: the governed path only governs if it's the only path.
Writing a policy with real teeth¶
Both paths grade actions on the same six-rung trust ladder:
L0 observe only — read state; never write or execute
L1 suggest only — produce proposals; nothing reaches the world
L2 isolated artifact — generate in tempfs; no effect on project state
L3 local reversible — modify project state, with notification
L4 external / shared — network, credentials, deploy, push — needs approval
L5 prohibited — cannot run in this context, ever
The simple preset you've seen — autoApprovePolicy /
auto_approve_ceiling — is genuinely a one-rule policy document under the
hood ("allow at or below N" over a structural deny default). Two properties
are worth knowing before you write a bigger one:
- The ceiling caps at L3. Auto-approving L4 is not expressible — the trust-ladder floor always holds L4 for approval and always denies L5, regardless of any rule you write. A config asking for a ceiling of 4 fails at parse time. This is the floor part 2 leaned on: the block on the poisoned push never depended on a well-written rule.
- Unmatched actions deny. The structural default is deny, and a probe
(
unmatched-action-defaults-to-deny) pins it.
A fuller policy is a JSON document with ordered rules — first decisive match wins, over the deny default:
{
"id": "my-team-policy",
"version": "v1",
"rules": [
{
"match": { "required_level_lte": 2 },
"effect": "allow",
"reason": "Reads, suggestions, isolated artifacts: free"
},
{
"match": { "tool": "mcp.fs.write_file" },
"effect": "allow",
"reason": "Workspace writes are compensable here"
},
{
"match": { "tool": "mcp.devtools.git_push" },
"effect": "require_approval",
"approval": {
"required_authority": { "min_trust_baseline": 0.8 }
},
"reason": "Pushes need a human with sufficient standing"
}
]
}
Wire it into the proxy by replacing the top-level auto_approve_ceiling key
in proxy.config.json with a policy block:
"policy": { "file": "./my-team-policy.json", "allow_unsigned": true }
allow_unsigned: true is the development mode. For production the policy
document is signed (Ed25519 over the canonical hash of
{id, version, rules}), the proxy verifies it at load, and a
require_approval rule's required_authority travels with the held action —
so the eventual approver must actually clear the bar the rule set. (Honest
gap: there is no lodestar policy sign command yet — the signature is
produced programmatically; the verification side is what's pinned in CI, by
policy-version-signature-required.)
For the library path, the same document works without the proxy: compile it
with compile(policy) from @qmilab/lodestar-policy-kernel and pass the
result as policy_gate — wrap() accepts a CompiledPolicy directly.
Resolving a held action from a second terminal¶
With approvals configured, an L4 action doesn't just bounce — it parks at
pending_approval while the proxy polls for an out-of-band resolution. Add
these two top-level keys to the same proxy.config.json:
"approval_timeout_ms": 120000,
"approvals": {
"authorized_keys": [
{ "actor_id": "alice", "public_key": "-----BEGIN PUBLIC KEY----- …" }
]
}
Mint your approver key once, then resolve holds from any other terminal:
# one-time: mint an Ed25519 keypair — writes alice.key (private, 0600) and
# alice.pub, and prints the authorized_keys pin to paste above
bun run lodestar approve keygen --approver alice --out ~/.lodestar/alice
# see what's parked
bun run lodestar approve list --project my-project
# let it through (or: approve deny <request-id> ... --reason "not today")
bun run lodestar approve grant <request-id> --approver alice \
--key ~/.lodestar/alice.key --project my-project
The mechanics are worth one sentence each, because they're load-bearing: the
approve CLI runs in a separate process and never writes the event log —
it drops a signed resolution into a side-channel the proxy polls, and the
proxy (the log's single writer) promotes it to the canonical
approval.granted@1 event. The signature is verified against the
operator-pinned keys before anything un-parks — a forged, unsigned, or
tampered grant is rejected, and the forged-approval-cannot-execute probe
holds that boundary in CI. And a granted approval still revalidates the
action's preconditions before execution; approval is not a skip-the-checks
pass.
If you time out instead (approval_timeout_ms elapses with no resolution),
the held action fails closed with a synthetic approval_timeout — the agent
sees a normal tool error, not a hung session.
Locking your invariant with a probe¶
Everything above gives you governance at runtime. The last step is making your guarantee survive your own refactors: write a probe — an adversarial script that fails loudly if the invariant ever regresses — and run it in CI. In Lodestar's own development, probes are treated as spec: 48 of them gate every change, and the rule is you fix the code, never the probe.
A probe is a small class with a name, a description, and a run() that
returns pass/fail plus human-readable detail lines:
import { Probe, runProbeAsScript, type ProbeResult } from "@qmilab/lodestar-harness"
class MyInvariantProbe extends Probe {
readonly name = "my-agent-cannot-exfiltrate-secrets"
readonly description =
"A belief sourced from .env content must never reach truth_status: supported"
async run(): Promise<ProbeResult> {
// Arrange: run your wrapped agent over a fixture with a planted marker.
// Act: query the belief store / event log it produced.
// Assert: no supported belief carries the marker.
const breached = false // your real check here
return breached
? { passed: false, details: ["MARKER found in a supported belief"] }
: { passed: true, details: ["marker stayed unverified everywhere"] }
}
}
runProbeAsScript(new MyInvariantProbe())
runProbeAsScript prints the banner and exits 0 on pass, non-zero on fail —
which is all CI needs. Group probes into a pack with a manifest:
{
"name": "my-agent-safety",
"version": "0.1.0",
"spec_version": "1",
"source_type": "local",
"description": "Invariants my agent's governance must never lose",
"coverage_areas": ["memory_firewall"],
"invariants": ["secrets_never_supported"],
"probes": [
{ "name": "my-agent-cannot-exfiltrate-secrets", "file": "probes/no-exfil.ts" }
]
}
(coverage_areas and invariants are free-form tags — name what you cover;
nothing validates them against a fixed list.) Then run the pack — the
--pack flag takes a first-party pack name, a pack directory, or a manifest
path:
bun run lodestar harness run --pack ./my-agent-safety
For a template with real assertions against a real store, the four probes in
packs/coding-agent-safety/
are the reference — each one is a standalone, readable script whose header
comment states the scenario, the assertions, and the non-claims. Start from
the one closest to your invariant and swap in your fixture.
Probe runs are themselves recorded as synthetic-trust observations in the
event log, so even your safety checks show up in the audit trail — honestly
labelled, like everything else.
The whole loop, end to end¶
Putting it together, the integration checklist:
- Pick your path —
guard.wrap()if you own the loop, the MCP proxy if you don't. - Grade your tools —
tool_defaults(or action contracts in code): trust level, blast radius, reversibility. Be honest; ungraded tools fall to a conservative default, visibly. - Set the policy — start with the ceiling preset; graduate to a rule document when you need per-tool nuance. L4 holds no matter what you write.
- Wire approvals —
keygenonce, pin the public key, resolve holds withapprove grant/denyfrom a second terminal. - Read your first report —
bun run lodestar report <session-id>. Check the Beliefs section for thesupported/unverifiedsplit doing its job on your own tools' output. - Lock it — write the probe for the one invariant you'd be embarrassed to lose, and put it in CI.
That's the series: part 1 showed it working, part 2 showed where the line holds and where it honestly doesn't, and this part handed you the wiring. If you wrap something real with it, open an issue or discussion — the probe packs especially are designed to grow beyond first-party.