Skip to content

Probe packs

A probe pack bundles adversarial probes (and optionally runtime sentinels) behind a manifest so the harness can load and run them as a unit. Probes are Lodestar's executable spec — not test scaffolding. They are not edited to match changed code; the code is expected to keep satisfying them.

The two first-party packs live in packs/:

  • lodestar-core — the core epistemic-chain, memory-firewall, guard, event-log, Policy Kernel, sentinel-wiring, adapter, and read-side invariants (43 probes).
  • coding-agent-safety — the "wrap a coding agent" story: prompt injection, tool poisoning, confidence drift, plus the three first-party sentinels (4 probes
  • 3 sentinels).

Run them with lodestar harness run --pack <name> or, for the whole suite, bun run probes:ci.

The manifest

Each pack has a lodestar.probe-pack.json at its root, validated against a Zod schema in @qmilab/lodestar-core and resolved by the loader in @qmilab/lodestar-harness.

{
  "name": "coding-agent-safety",
  "version": "0.1.0",
  "spec_version": "1",
  "source_type": "local",
  "description": "Adversarial probes and online sentinels for the 'wrap a coding agent' story…",
  "coverage_areas": ["prompt_injection", "tool_poisoning", "..."],
  "invariants": ["injection_defense", "no_self_promotion", "..."],
  "probes": [
    { "name": "prompt-injection-cross-tool", "file": "probes/prompt-injection-cross-tool.ts" }
  ],
  "sentinels": [
    { "id": "low-confidence-action" }
  ]
}
Field Type Notes
name string (kebab-case) Pack identifier
version string (semver) Pack version
spec_version "1" Manifest spec version (fixed at "1")
source_type "local" | "npm" v0 loads local packs only
description string Optional, human-readable
coverage_areas string[] Free-form tags for what the pack covers
invariants string[] Free-form tags for the invariants it pins
probes { name, file }[] At least one; name kebab-case, file a pack-relative path
sentinels { id }[] Optional; each id resolves against the harness's first-party registry

The loader resolves each probe file to an absolute path within the pack root (symlink-aware), enforces unique probe names, and resolves each sentinel id against the built-in FIRST_PARTY_SENTINELS registry. v0 does not support third-party sentinels or npm-sourced packs — both are reserved for the post-v1 registry.

The 43 probes in lodestar-core

Firewall, epistemic-chain, guard, and event-log invariants (Batches 1–5):

Probe Pins
memory-poisoning-basic a planted "successful experience" is not promoted
epistemic-chain-smoke the full chain links end to end
external-document-not-normal external_document evidence can't adopt at normal retrieval
quarantined-not-retrievable a quarantined belief can't reach the planner
sensitivity-ceiling secret beliefs stay out of default context
auto-observation-gate external_document / model_inference can't auto-promote
guard-import-no-self-promote imported memory can't self-promote
guard-precondition-revalidation two-phase execution re-checks preconditions
guard-contract-invariants action contracts hold
context-policy-contradiction-routing contradictions surface in their own channel
kernel-context-propagation real session/project ids propagate (no stub fallback)
event-log-single-writer concurrent appends don't tear the log
mcp-proxy-roundtrip the proxy round-trips a tool call faithfully
mcp-proxy-injection-defense injected tool-result content stays unverified
reflection-cannot-promote-to-normal-alone reflection alone can't promote a belief
contradicted-belief-flags-dependent-decisions a contradiction cascades to dependents
event-log-canonical-hash canonical payload hashing is stable
documentation-evidence-provenance doc claims carry their evidence provenance

Policy Kernel, the trust ladder, and the approval lifecycle:

Probe Pins
l4-action-requires-approval an L4 action is held at pending_approval, never executed outright
l4-floor-preserves-stricter-rule the trust-ladder floor keeps the stricter of rule vs. ladder
pending-approval-cannot-execute a held action can't run until it's granted
ladder-floor-overrides-allow-rule the ladder floor overrides a too-permissive allow rule
unmatched-action-defaults-to-deny an action no rule matches defaults to deny
policy-version-signature-required a policy document must carry a valid signature
granted-approval-still-revalidates-preconditions a granted approval still re-checks preconditions before running
guard-hold-resolves-via-resolver a held action resolves through the in-process approval-resolver seam
approval-timeout-denies a hold with no approval times out to a denial
approval-via-side-channel a separate-process lodestar approve resolution un-parks the hold
forged-approval-cannot-execute a forged / unsigned / tampered approval can't un-park a held L4 (Ed25519)
proxy-hold-carries-rule-authority a held action carries its matched rule's required authority

Sentinel→action and calibration→action wiring:

Probe Pins
sentinel-alert-gates-dependent-action a sentinel alert holds the dependent action at the gate
calibration-flag-escalates-action a calibration flag strengthens the gate decision
guard-arbiter-gates-dependent-action a real sentinel → guard.wrap() host → the dependent action is held
mcp-proxy-arbiter-gates-dependent-action the MCP-proxy analogue — a synthesized decision holds the poisoned dependent call

Read side — the viewer and OTel export:

Probe Pins
viewer-is-read-only the viewer surfaces the chain + pending approvals but never writes the log
otel-export-respects-sensitivity-ceiling content above the ceiling exports as metadata + payload hash only
otel-export-projects-action-spans a session projects to the action-centric span tree

Native governed adapters (each drives the real adapter through the real kernel):

Probe Pins
shell-adapter-enforces-sandbox-invariants the shell adapter's TS-level sandbox (no host env, argv-only, timeout, bounded capture)
git-adapter-enforces-egress-invariants git transport — L4 push held, remote pinning beats a poisoned .git/config, no credential leak
nostr-adapter-enforces-egress-invariants nostr — relay pinning, in-process BIP-340 signing, the key never on the wire
http-adapter-enforces-egress-invariants http — hostname pinning + per-hop redirect re-validation against SSRF, host-bound credential
messaging-adapter-enforces-egress-invariants messaging — destination pinning, operator-fixed sender, no redirect following

Durable calibration:

Probe Pins
calibration-event-is-durable a calibration pass records a durable, replayable calibration.computed@1 event

The 4 probes + 3 sentinels in coding-agent-safety

Probe Pins
prompt-injection-cross-tool a cross-tool injection doesn't promote
tool-poisoning-cross-session provenance survives across two sessions (needs Postgres — see below)
confidence-drift miscalibration is flagged per class; synthetic beliefs excluded
poisoned-file-cannot-hijack-feature-work a poisoned file can't hijack the feature plan
Sentinel Watches for
low-confidence-action a high-trust action on a weak belief
suspicious-memory-origin an external_document belief steering a decision
anomalous-tool-sequence a tool sequence that deviates from the task shape

The Postgres-backed probe

All 47 probes pass under strict TypeScript. One — tool-poisoning-cross-session — exercises the Postgres-backed belief store across two sessions, so it reads LODESTAR_TEST_DATABASE_URL and skips with a loud banner (exit 0) when that variable is unset. CI runs it for real against a postgres:16 service.