Probe packs¶
A probe pack bundles adversarial probes (and optionally runtime sentinels) behind a manifest so the harness can load and run them as a unit. Probes are Lodestar's executable spec — not test scaffolding. They are not edited to match changed code; the code is expected to keep satisfying them.
The two first-party packs live in packs/:
lodestar-core— the core epistemic-chain, memory-firewall, guard, event-log, Policy Kernel, sentinel-wiring, adapter, and read-side invariants (43 probes).coding-agent-safety— the "wrap a coding agent" story: prompt injection, tool poisoning, confidence drift, plus the three first-party sentinels (4 probes- 3 sentinels).
Run them with lodestar harness run --pack <name>
or, for the whole suite, bun run probes:ci.
The manifest¶
Each pack has a lodestar.probe-pack.json at its root, validated against a Zod
schema in @qmilab/lodestar-core and resolved by the loader in
@qmilab/lodestar-harness.
{
"name": "coding-agent-safety",
"version": "0.1.0",
"spec_version": "1",
"source_type": "local",
"description": "Adversarial probes and online sentinels for the 'wrap a coding agent' story…",
"coverage_areas": ["prompt_injection", "tool_poisoning", "..."],
"invariants": ["injection_defense", "no_self_promotion", "..."],
"probes": [
{ "name": "prompt-injection-cross-tool", "file": "probes/prompt-injection-cross-tool.ts" }
],
"sentinels": [
{ "id": "low-confidence-action" }
]
}
| Field | Type | Notes |
|---|---|---|
name |
string (kebab-case) | Pack identifier |
version |
string (semver) | Pack version |
spec_version |
"1" |
Manifest spec version (fixed at "1") |
source_type |
"local" | "npm" |
v0 loads local packs only |
description |
string | Optional, human-readable |
coverage_areas |
string[] | Free-form tags for what the pack covers |
invariants |
string[] | Free-form tags for the invariants it pins |
probes |
{ name, file }[] |
At least one; name kebab-case, file a pack-relative path |
sentinels |
{ id }[] |
Optional; each id resolves against the harness's first-party registry |
The loader resolves each probe file to an absolute path within the pack root
(symlink-aware), enforces unique probe names, and resolves each sentinel id
against the built-in FIRST_PARTY_SENTINELS registry. v0 does not support
third-party sentinels or npm-sourced packs — both are reserved for the post-v1
registry.
The 43 probes in lodestar-core¶
Firewall, epistemic-chain, guard, and event-log invariants (Batches 1–5):
| Probe | Pins |
|---|---|
memory-poisoning-basic |
a planted "successful experience" is not promoted |
epistemic-chain-smoke |
the full chain links end to end |
external-document-not-normal |
external_document evidence can't adopt at normal retrieval |
quarantined-not-retrievable |
a quarantined belief can't reach the planner |
sensitivity-ceiling |
secret beliefs stay out of default context |
auto-observation-gate |
external_document / model_inference can't auto-promote |
guard-import-no-self-promote |
imported memory can't self-promote |
guard-precondition-revalidation |
two-phase execution re-checks preconditions |
guard-contract-invariants |
action contracts hold |
context-policy-contradiction-routing |
contradictions surface in their own channel |
kernel-context-propagation |
real session/project ids propagate (no stub fallback) |
event-log-single-writer |
concurrent appends don't tear the log |
mcp-proxy-roundtrip |
the proxy round-trips a tool call faithfully |
mcp-proxy-injection-defense |
injected tool-result content stays unverified |
reflection-cannot-promote-to-normal-alone |
reflection alone can't promote a belief |
contradicted-belief-flags-dependent-decisions |
a contradiction cascades to dependents |
event-log-canonical-hash |
canonical payload hashing is stable |
documentation-evidence-provenance |
doc claims carry their evidence provenance |
Policy Kernel, the trust ladder, and the approval lifecycle:
| Probe | Pins |
|---|---|
l4-action-requires-approval |
an L4 action is held at pending_approval, never executed outright |
l4-floor-preserves-stricter-rule |
the trust-ladder floor keeps the stricter of rule vs. ladder |
pending-approval-cannot-execute |
a held action can't run until it's granted |
ladder-floor-overrides-allow-rule |
the ladder floor overrides a too-permissive allow rule |
unmatched-action-defaults-to-deny |
an action no rule matches defaults to deny |
policy-version-signature-required |
a policy document must carry a valid signature |
granted-approval-still-revalidates-preconditions |
a granted approval still re-checks preconditions before running |
guard-hold-resolves-via-resolver |
a held action resolves through the in-process approval-resolver seam |
approval-timeout-denies |
a hold with no approval times out to a denial |
approval-via-side-channel |
a separate-process lodestar approve resolution un-parks the hold |
forged-approval-cannot-execute |
a forged / unsigned / tampered approval can't un-park a held L4 (Ed25519) |
proxy-hold-carries-rule-authority |
a held action carries its matched rule's required authority |
Sentinel→action and calibration→action wiring:
| Probe | Pins |
|---|---|
sentinel-alert-gates-dependent-action |
a sentinel alert holds the dependent action at the gate |
calibration-flag-escalates-action |
a calibration flag strengthens the gate decision |
guard-arbiter-gates-dependent-action |
a real sentinel → guard.wrap() host → the dependent action is held |
mcp-proxy-arbiter-gates-dependent-action |
the MCP-proxy analogue — a synthesized decision holds the poisoned dependent call |
Read side — the viewer and OTel export:
| Probe | Pins |
|---|---|
viewer-is-read-only |
the viewer surfaces the chain + pending approvals but never writes the log |
otel-export-respects-sensitivity-ceiling |
content above the ceiling exports as metadata + payload hash only |
otel-export-projects-action-spans |
a session projects to the action-centric span tree |
Native governed adapters (each drives the real adapter through the real kernel):
| Probe | Pins |
|---|---|
shell-adapter-enforces-sandbox-invariants |
the shell adapter's TS-level sandbox (no host env, argv-only, timeout, bounded capture) |
git-adapter-enforces-egress-invariants |
git transport — L4 push held, remote pinning beats a poisoned .git/config, no credential leak |
nostr-adapter-enforces-egress-invariants |
nostr — relay pinning, in-process BIP-340 signing, the key never on the wire |
http-adapter-enforces-egress-invariants |
http — hostname pinning + per-hop redirect re-validation against SSRF, host-bound credential |
messaging-adapter-enforces-egress-invariants |
messaging — destination pinning, operator-fixed sender, no redirect following |
Durable calibration:
| Probe | Pins |
|---|---|
calibration-event-is-durable |
a calibration pass records a durable, replayable calibration.computed@1 event |
The 4 probes + 3 sentinels in coding-agent-safety¶
| Probe | Pins |
|---|---|
prompt-injection-cross-tool |
a cross-tool injection doesn't promote |
tool-poisoning-cross-session |
provenance survives across two sessions (needs Postgres — see below) |
confidence-drift |
miscalibration is flagged per class; synthetic beliefs excluded |
poisoned-file-cannot-hijack-feature-work |
a poisoned file can't hijack the feature plan |
| Sentinel | Watches for |
|---|---|
low-confidence-action |
a high-trust action on a weak belief |
suspicious-memory-origin |
an external_document belief steering a decision |
anomalous-tool-sequence |
a tool sequence that deviates from the task shape |
The Postgres-backed probe¶
All 47 probes pass under strict TypeScript. One —
tool-poisoning-cross-session — exercises the Postgres-backed belief store
across two sessions, so it reads LODESTAR_TEST_DATABASE_URL and skips with a
loud banner (exit 0) when that variable is unset. CI runs it for real against a
postgres:16 service.
Related¶
- Sentinels and calibration — what sentinels and the calibrator do at runtime.
- CLI reference —
lodestar probeandlodestar harness run. - Get started — run the suite.