contract-plan-spec-intake
Guided intake to collect all answers needed to draft CONTRACT.md, PLAN.md (from PLAN.template.md), and spec updates in one pass; use for new features or scope changes when the user wants to answer questions once and then activate a swarm to write tests, code, and verify.
SKILL.md
| Name | contract-plan-spec-intake |
| Description | Guided intake to collect all answers needed to draft CONTRACT.md, PLAN.md (from PLAN.template.md), and spec updates in one pass; use for new features or scope changes when the user wants to answer questions once and then activate a swarm to write tests, code, and verify. |
Gedanken Engine for Regret Minimization
This repository is a repo-native demonstration of a repeatable pipeline for compiling theory into executable checks: “Theory → Plan → Specs → Swarm → Tests → Code,” applied to regret minimization.
The input is theory/theory.pdf, treated as an executable contract: swarms translate invariants into a concrete plan, derive specs that define admissibility and metrics, generate tests that attempt to falsify the invariants, then implement src/<code> only to the extent needed to make the tests pass under deterministic replay. This yields audit-ready counterfactual evaluation where regret is computed against admissible baselines under identical boundary conditions.
Repository layout as a compilation pipeline
This repository is intentionally structured as a compiler from theory to falsifiable engineering artifacts, rather than as a library or reference implementation. Each directory corresponds to a compilation stage, not a convenience grouping.
theory/
- Input contracts only.
theory.pdfis the canonical example. - Treated as immutable. No edits are required or expected.
- All downstream artifacts must trace back to explicit statements or invariants in this document.
plans/
- Manager-authored run plans for a swarm session.
- Defines lanes, merge order, and “definition of done” gates.
plans/PLAN.mdis the canonical spec after user approval.- The plan is the only place where cross-agent coordination is allowed, and it is treated as a versioned control artifact.
scripts/agent-orchestrator/
- Swarm orchestration role specs used by
make swarm(spec.md,agent1.md...agent5.md). - Defines manager/worker prompt contracts and per-lane scope for replayable runs.
tests/
- Tests are generated before implementation.
- Focus on invariants, counterfactual validity, and nondeterminism detection.
- Any change in behavior must surface as a test failure, not as a reinterpretation of theory or specs.
src/
- Minimal code required to satisfy the tests.
- No speculative features. No hidden state. No online adaptation.
- Exists to verify the theory, not to embellish it.
runs/
- Immutable records of executed canonical runs.
- Each run captures the exact (plan, specs, seed, trace) tuple used.
- Serves as the primary audit surface for reproducibility and review.
traces/
- Current state: committed fixture traces in
traces/fixtures/*.jsonlplustraces/fixtures/manifest.sha256. - Planned state: broader trace datasets (
canonical,golden,evals) as the offline analyzer pipeline is implemented. - Invariance rule: workload traces are frozen across alternatives unless explicitly modeled inside the system.
How to run
Prerequisites:
- Python
3.11 uv- Optional:
codexon PATH (required only formake swarm)
- Bootstrap
make init
- Validate environment
make check
- See deterministic replay
make replay RUN_ID=1 TRACE=traces/fixtures/fixture_five_agent.jsonl SEED=7
- See offline analysis
make analyze RUN_ID=1 ANALYZE_IN=runs/1/events.jsonl ANALYZE_OUT=runs/1/report.json
- Validate orchestrator gate
make gate GATE_RUN=runs/1
- Optional smoke and swarm
make smoke
make swarm RUN_ID=1
Feature Tour
make replay ...- Stdout: deterministic JSONL stream (
STEPevents or deterministicERRORpayload). - Artifact:
runs/<run_id>/events.jsonlwhen replay runs with tee enabled.
- Stdout: deterministic JSONL stream (
make analyze ...- Stdout: quiet on success; deterministic
ERRORJSON on failure. - Artifact:
runs/<run_id>/report.jsonwith deterministic-input validity and scalar summary.
- Stdout: quiet on success; deterministic
make gate ...- Stdout: JSON gate report with
PASSorFAIL. - Inputs:
runs/<run_id>/manager_tasks.yaml,runs/<run_id>/manager_verdict.yaml,runs/<run_id>/agent*/out.yaml.
- Stdout: JSON gate report with
make smoke- Stdout: deterministic
TRAIN_SMOKEJSON payload (or deterministic config error).
- Stdout: deterministic
make swarm ...- Artifacts:
runs/run-<id>-manager.jsonl,runs/run-<id>-agent*.jsonl,runs/run-<id>-swarm.jsonl.
- Artifacts:
Determinism Contract
spec: approvedplans/PLAN.mdtrace: frozen fixture trace or declared tape inputseed|tape: exactly one deterministic driver input must be declared
Identical boundary tuples must yield identical ordered event streams and deterministic derived hashes.
Artifact Map
| Artifact | Path | Purpose | Required/Expected fields |
|---|---|---|---|
| Fixture manifest | traces/fixtures/manifest.sha256 | Pin fixture integrity | <sha256><space><space><relative_path> entries |
| Fixture traces | traces/fixtures/*.jsonl | Frozen workload trace input | Domain events with deterministic ordering fields expected by replay |
| Replay event stream | runs/<run_id>/events.jsonl | Deterministic replay output | kind; for STEP: seq, t, state_hash_pre, actions, reward, cost |
| Analysis report | runs/<run_id>/report.json | Offline measurement output | report_version, run_id, source_events_path, source_events_sha256, event_count, analysis.* |
| Swarm logs (optional) | runs/run-<id>-manager.jsonl, runs/run-<id>-agent*.jsonl, runs/run-<id>-swarm.jsonl | Manager/worker orchestration audit | Append-only JSONL log events |
Architecture and verification
- Architecture contract:
docs/ARCHITECTURE.md - Verification commands:
make check
make gate
make smoke
CI and Security
The repository uses GitHub-native CI and security automation to enforce deterministic quality gates and supply-chain controls.
Required workflow checks for protected branches:
ci / check(runsmake check)ci / gate(runsmake gate)ci / smoke(runsmake smoke)security / codeqlsecurity / dependency-auditsecurity / sbomsecurity / attest
Dependency automation policy:
- Dependabot runs weekly for
pipandgithub-actions. - Dependency pull requests are labeled
dependenciesandsecurity. - Security updates are grouped separately from routine updates.
PR security checklist policy:
- PR authors must complete the security/supply-chain checklist in
.github/pull_request_template.md. - Required PR notes include security impact, dependency/lockfile changes, threat-model delta, and rollback plan.
Branch protection settings are configured in the GitHub UI (not versioned in this repo). For main, enforce required status checks and review requirements in line with AGENTS.md guardrails.
References
docs/ARCHITECTURE.mdtheory/theory.pdf
License
See LICENSE.
Citation
This repository is intended to be cited as a methodology example for operationalizing regret minimization using multi-agent (swarm) workflows under deterministic constraints.
What is being demonstrated:
- A method for compiling theoretical claims into executable, falsifiable checks.
- A controlled use of swarms to decompose research into specs, tests, and minimal code.
- An engineering treatment of regret minimization as a property of a system, not a post-hoc analysis.
What is not being claimed:
- A single optimal regret-minimization algorithm.
- General empirical superiority over other approaches.
- Completeness of the theory input. If you use or reference this repository, please cite:
@software{mcbride_2026_gedanken-engine-for-regret-minimization,
author = {Michael McBride},
title = {Gedanken Engine for Regret Minimization},
year = {2026},
url = {https://github.com/MichaelsEngineering/gedanken-engine-for-regret-minimization},
version = {0.2}
}