Wrap your own agent: guard.wrap(), the MCP proxy, policies, and probes¶

Part 1 showed the demos. Part 2 made the case to your security team. This part is the wiring guide: putting Lodestar around your agent, today, with snippets that run as written.

Everything below works from a clone with Bun:

git clone https://github.com/qmilab/lodestar
cd lodestar
bun install

(The packages are also on npm as @qmilab/lodestar-* at v0.2.0 — the clone path is used here because it gives you the examples and probes to crib from, and every bun run lodestar … command below resolves through the clone root's lodestar script. In your own project instead: bun add @qmilab/lodestar-guard @qmilab/lodestar-harness @qmilab/lodestar-adapter-filesystem covers the library-path imports, and bun add @qmilab/lodestar-cli gives you the same CLI as bunx lodestar ….)

TL;DR — Two ways in. Own the loop? guard.wrap() is a function call around your agent code. Don't own the agent (Claude Code, Cursor, Aider)? lodestar guard mcp-proxy sits between it and its MCP tools — no agent changes, but you must deny the agent's built-in tools so the governed path is the only path. Either way you grade your tools on the trust ladder, set an auto-approve ceiling (L4 cannot be auto-approved — the floor is not negotiable), resolve held actions with lodestar approve from a second terminal, and lock the invariant you care about with a probe that fails CI if it ever regresses.

Two ways in¶

The decision is one question: do you own the agent's loop?

You wrote the loop (a homegrown agent, a script that calls an LLM with tools): use guard.wrap(). It's a library call; your loop runs inside a governed context and every tool call flows through the kernel.
You don't own the agent (Claude Code, Cursor, Aider — anything that speaks MCP): use the MCP proxy. The agent's MCP config points at the proxy; the proxy owns the real downstream servers and governs every tools/call in between. No changes to the agent.

The two paths produce the same thing: an append-only event log you can render with lodestar report <session-id>, with the same policy gate and the same memory-firewall semantics from part 2.

Path 1: the greenfield loop — `guard.wrap()`¶

Here is the minimal real shape, as used by the coding-agent-greenfield example:

import {
  wrap,
  autoApprovePolicy,
  alwaysHoldsChecker,
  type GuardContext,
} from "@qmilab/lodestar-guard"
import { registerFsReadTool } from "@qmilab/lodestar-adapter-filesystem"

// Tools are registered with the Action Kernel up front — the adapters ship
// graded contracts (fs.read arrives as an L0, read-only contract).
registerFsReadTool(process.cwd())

// Your agent loop, unchanged except that tool calls go through ctx.
const agentLoop = async (ctx: GuardContext) => {
  // Every tool call routes through the kernel: propose → gate → execute.
  await ctx.callTool("fs.read", { path: "README.md" }, { intent: "read the README" })
  // ctx.ingestObservation(...) — feed an observation through the Cognitive Core
  // ctx.recordDecision(...)   — declare a decision + the beliefs it depends on
  // ctx.emit(...)             — debug events (recorded, but never trusted for gating)
  return { done: true }
}

const run = wrap(agentLoop)

const { result, session_id, log_root } = await run({
  project_id: "my-agent",
  actor_id: "agent:mine",
  default_scope: { level: "project", identifier: "my-agent" },
  default_sensitivity: "internal",
  policy_gate: autoApprovePolicy({
    // L0–L2 sail through; L3 falls to the structural deny default;
    // L4 always holds, L5 always denies.
    auto_approve_up_to: 2,
    approver_id: "policy-decider",
  }),
  precondition_checker: alwaysHoldsChecker,
})

console.log(`report: bun run lodestar report ${session_id}`)

Things to notice, because they're deliberate:

There are no silent defaults for the security-relevant settings. Scope, sensitivity, the policy gate, the precondition checker — all explicit. If your policy can hold an action, you must also supply an approval_resolver; a hold with no resolver is a hard error at callTool time, not a silent auto-deny. (The snippet above registers only fs.read — an L0 tool — so nothing in it can hold; the moment you register an L4 tool, supply a resolver. The approvals section below shows how.)
ctx.emit() is honest about its own trust level. Raw agent emits are recorded for debugging but are not trusted to drive sentinel arbitration (the runtime monitors from part 2 that can flag a suspicious belief and hold actions depending on it) — an agent cannot talk its way past a belief-scoped alert by emitting noise. The trusted way to declare "this plan rests on these beliefs" is ctx.recordDecision().
The result hands you the session. session_id is exactly what lodestar report takes; log_root is where the NDJSON event log landed (default .lodestar/events).

The extension seam: teaching the guard about your evidence¶

The documentation-agent example (part 1's warm-up demo) adds exactly one thing to the config above — a custom evidence linker, through the cognitive seam:

import { DocAwareEvidenceLinker } from "@qmilab/lodestar-cognitive-core"

const { result, session_id } = await run({
  // ... same required fields as above ...
  policy_gate: autoApprovePolicy({ auto_approve_up_to: 2, approver_id: "doc-agent-policy" }),
  precondition_checker: alwaysHoldsChecker,
  cognitive: {
    evidenceLinkerFactory: ({ evidence, beliefs }) =>
      new DocAwareEvidenceLinker(evidence, beliefs),
  },
})

That one factory is what makes file content land as external_document evidence (stamped with its source file, kept unverified) instead of being quietly trusted — and it's the exact seam the documentation-evidence-provenance probe from part 2 pins in CI. The same seam takes an MCP-aware linker, or your own: anything that decides what quality of evidence a claim's source amounts to.

If you want durable, cross-session state, the sibling stores seam injects Postgres-backed claim/belief/evidence stores — that's the seam the tool-poisoning-cross-session probe rides, and its source doubles as the wiring example.

Path 2: the agent you don't own — the MCP proxy¶

The proxy is a config file plus one command. Here's the shape of the real config that drove the live Claude Code run in part 1 (real-claude-code/proxy.config.json, trimmed):

{
  "project_id": "telenotes-governed-dev-claude-code",
  "actor_id": "agent:claude-code",
  "session_id": "auto",
  "log_root": ".lodestar/events",
  "default_scope": { "level": "project", "identifier": "telenotes-governed-dev-claude-code" },
  "default_sensitivity": "internal",
  "auto_approve_ceiling": 3,
  "downstream_servers": [
    {
      "name": "fs",
      "command": "bunx",
      "args": ["@modelcontextprotocol/server-filesystem", "/absolute/path/to/workspace"]
    },
    {
      "name": "devtools",
      "command": "bun",
      "args": ["run", "/path/to/dev-tools-mcp/bin.ts", "/absolute/path/to/workspace"]
    }
  ],
  "tool_defaults": {
    "mcp.fs.read_text_file": {
      "reversibility": "reversible",
      "permissions": ["fs.read"],
      "sandbox": "read",
      "required_trust_level": 0,
      "blast_radius": "self"
    },
    "mcp.fs.write_file": {
      "reversibility": "compensable",
      "permissions": ["fs.write"],
      "sandbox": "write-local",
      "required_trust_level": 3,
      "blast_radius": "project"
    },
    "mcp.devtools.git_push": {
      "reversibility": "irreversible",
      "permissions": ["network.egress"],
      "sandbox": "controlled-shell",
      "required_trust_level": 4,
      "blast_radius": "external"
    }
  }
}

(The placeholder paths: in the clone, the dev-tools server lives at examples/telenotes-governed-dev/dev-tools-mcp/bin.ts; the fs server needs no clone at all — bunx fetches it.)

Read it top to bottom and you've read the governance story:

downstream_servers — the proxy spawns these as child processes and re-exposes their tools upstream under namespaced names (mcp.<server>.<tool>). The agent sees one MCP server; the proxy sees everything.
tool_defaults — you grade each tool: trust level, blast radius, reversibility, sandbox profile. The proxy deliberately ignores MCP annotations as a trust source — per the MCP spec they're untrusted unless the server is — so the grading is operator-authored, not taken from the wire. Any tool you didn't enumerate falls to a conservative default (L3, irreversible, controlled-shell), which you'll see in your report as a nudge to grade it explicitly.
auto_approve_ceiling: 3 — the gate from part 1: L0–L3 auto-approve, the L4 git_push is rejected (or held, once you add approvals — next section).

Start it, point your agent at it:

bun run lodestar guard mcp-proxy --config ./proxy.config.json

For Claude Code, the project's .mcp.json declares the proxy as the one MCP server (the committed example):

{
  "mcpServers": {
    "lodestar": {
      "command": "bun",
      "args": [
        "run", "/path/to/lodestar/packages/cli/src/index.ts",
        "guard", "mcp-proxy",
        "--config", "/path/to/proxy.config.json"
      ]
    }
  }
}

The caveat that matters: deny the built-ins¶

A real coding agent ships its own file and shell tools, and those never touch MCP — if they stay enabled, the agent edits files directly and your trust report comes back empty of write actions. The proxy can only govern what flows through it. So launch the agent with built-ins denied and the proxy allowed, exactly as part 1's live run did:

claude -p "<your task>" \
  --mcp-config .mcp.json --strict-mcp-config \
  --disallowedTools Edit Write MultiEdit NotebookEdit Bash Read Glob Grep LS WebFetch WebSearch \
  --allowedTools "mcp__lodestar__*" \
  --output-format text

This is stated as a caveat in the repo's own recipe, and in part 2's honest-limits list, because it's the kind of thing a wrapper should say out loud: the governed path only governs if it's the only path.

Writing a policy with real teeth¶

Both paths grade actions on the same six-rung trust ladder:

L0  observe only        — read state; never write or execute
L1  suggest only        — produce proposals; nothing reaches the world
L2  isolated artifact   — generate in tempfs; no effect on project state
L3  local reversible    — modify project state, with notification
L4  external / shared   — network, credentials, deploy, push — needs approval
L5  prohibited          — cannot run in this context, ever

The simple preset you've seen — autoApprovePolicy / auto_approve_ceiling — is genuinely a one-rule policy document under the hood ("allow at or below N" over a structural deny default). Two properties are worth knowing before you write a bigger one:

The ceiling caps at L3. Auto-approving L4 is not expressible — the trust-ladder floor always holds L4 for approval and always denies L5, regardless of any rule you write. A config asking for a ceiling of 4 fails at parse time. This is the floor part 2 leaned on: the block on the poisoned push never depended on a well-written rule.
Unmatched actions deny. The structural default is deny, and a probe (unmatched-action-defaults-to-deny) pins it.

A fuller policy is a JSON document with ordered rules — first decisive match wins, over the deny default:

{
  "id": "my-team-policy",
  "version": "v1",
  "rules": [
    {
      "match": { "required_level_lte": 2 },
      "effect": "allow",
      "reason": "Reads, suggestions, isolated artifacts: free"
    },
    {
      "match": { "tool": "mcp.fs.write_file" },
      "effect": "allow",
      "reason": "Workspace writes are compensable here"
    },
    {
      "match": { "tool": "mcp.devtools.git_push" },
      "effect": "require_approval",
      "approval": {
        "required_authority": { "min_trust_baseline": 0.8 }
      },
      "reason": "Pushes need a human with sufficient standing"
    }
  ]
}

Wire it into the proxy by replacing the top-level auto_approve_ceiling key in proxy.config.json with a policy block:

"policy": { "file": "./my-team-policy.json", "allow_unsigned": true }

allow_unsigned: true is the development mode. For production the policy document is signed (Ed25519 over the canonical hash of {id, version, rules}), the proxy verifies it at load, and a require_approval rule's required_authority travels with the held action — so the eventual approver must actually clear the bar the rule set. (Honest gap: there is no lodestar policy sign command yet — the signature is produced programmatically; the verification side is what's pinned in CI, by policy-version-signature-required.)

For the library path, the same document works without the proxy: compile it with compile(policy) from @qmilab/lodestar-policy-kernel and pass the result as policy_gate — wrap() accepts a CompiledPolicy directly.

Resolving a held action from a second terminal¶

With approvals configured, an L4 action doesn't just bounce — it parks at pending_approval while the proxy polls for an out-of-band resolution. Add these two top-level keys to the same proxy.config.json:

"approval_timeout_ms": 120000,
"approvals": {
  "authorized_keys": [
    { "actor_id": "alice", "public_key": "-----BEGIN PUBLIC KEY----- …" }
  ]
}

Mint your approver key once, then resolve holds from any other terminal:

# one-time: mint an Ed25519 keypair — writes alice.key (private, 0600) and
# alice.pub, and prints the authorized_keys pin to paste above
bun run lodestar approve keygen --approver alice --out ~/.lodestar/alice

# see what's parked
bun run lodestar approve list --project my-project

# let it through (or: approve deny <request-id> ... --reason "not today")
bun run lodestar approve grant <request-id> --approver alice \
  --key ~/.lodestar/alice.key --project my-project

The mechanics are worth one sentence each, because they're load-bearing: the approve CLI runs in a separate process and never writes the event log — it drops a signed resolution into a side-channel the proxy polls, and the proxy (the log's single writer) promotes it to the canonical approval.granted@1 event. The signature is verified against the operator-pinned keys before anything un-parks — a forged, unsigned, or tampered grant is rejected, and the forged-approval-cannot-execute probe holds that boundary in CI. And a granted approval still revalidates the action's preconditions before execution; approval is not a skip-the-checks pass.

If you time out instead (approval_timeout_ms elapses with no resolution), the held action fails closed with a synthetic approval_timeout — the agent sees a normal tool error, not a hung session.

Locking your invariant with a probe¶

Everything above gives you governance at runtime. The last step is making your guarantee survive your own refactors: write a probe — an adversarial script that fails loudly if the invariant ever regresses — and run it in CI. In Lodestar's own development, probes are treated as spec: 48 of them gate every change, and the rule is you fix the code, never the probe.

A probe is a small class with a name, a description, and a run() that returns pass/fail plus human-readable detail lines:

import { Probe, runProbeAsScript, type ProbeResult } from "@qmilab/lodestar-harness"

class MyInvariantProbe extends Probe {
  readonly name = "my-agent-cannot-exfiltrate-secrets"
  readonly description =
    "A belief sourced from .env content must never reach truth_status: supported"

  async run(): Promise<ProbeResult> {
    // Arrange: run your wrapped agent over a fixture with a planted marker.
    // Act:     query the belief store / event log it produced.
    // Assert:  no supported belief carries the marker.
    const breached = false // your real check here
    return breached
      ? { passed: false, details: ["MARKER found in a supported belief"] }
      : { passed: true, details: ["marker stayed unverified everywhere"] }
  }
}

runProbeAsScript(new MyInvariantProbe())

runProbeAsScript prints the banner and exits 0 on pass, non-zero on fail — which is all CI needs. Group probes into a pack with a manifest:

{
  "name": "my-agent-safety",
  "version": "0.1.0",
  "spec_version": "1",
  "source_type": "local",
  "description": "Invariants my agent's governance must never lose",
  "coverage_areas": ["memory_firewall"],
  "invariants": ["secrets_never_supported"],
  "probes": [
    { "name": "my-agent-cannot-exfiltrate-secrets", "file": "probes/no-exfil.ts" }
  ]
}

(coverage_areas and invariants are free-form tags — name what you cover; nothing validates them against a fixed list.) Then run the pack — the --pack flag takes a first-party pack name, a pack directory, or a manifest path:

bun run lodestar harness run --pack ./my-agent-safety

For a template with real assertions against a real store, the four probes in packs/coding-agent-safety/ are the reference — each one is a standalone, readable script whose header comment states the scenario, the assertions, and the non-claims. Start from the one closest to your invariant and swap in your fixture.

Probe runs are themselves recorded as synthetic-trust observations in the event log, so even your safety checks show up in the audit trail — honestly labelled, like everything else.

The whole loop, end to end¶

Putting it together, the integration checklist:

Pick your path — guard.wrap() if you own the loop, the MCP proxy if you don't.
Grade your tools — tool_defaults (or action contracts in code): trust level, blast radius, reversibility. Be honest; ungraded tools fall to a conservative default, visibly.
Set the policy — start with the ceiling preset; graduate to a rule document when you need per-tool nuance. L4 holds no matter what you write.
Wire approvals — keygen once, pin the public key, resolve holds with approve grant/deny from a second terminal.
Read your first report — bun run lodestar report <session-id>. Check the Beliefs section for the supported/unverified split doing its job on your own tools' output.
Lock it — write the probe for the one invariant you'd be embarrassed to lose, and put it in CI.

That's the series: part 1 showed it working, part 2 showed where the line holds and where it honestly doesn't, and this part handed you the wiring. If you wrap something real with it, open an issue or discussion — the probe packs especially are designed to grow beyond first-party.