April 19, 2026
A molecular harness for Claude Code: how I run long-form AI projects

Why a harness at all
Claude Code out of the box is fast and careless. On a weekend hack that's a feature — you'll ship something in a week. On a project that has to live for months, touch production, have phases and ADRs, the same speed becomes a bug: the agent writes 400 lines when you asked for 40, declares "should work" instead of verifying, starts Phase 2 while Phase 1 is still smoking.
My harness is an attempt to tame that speed without killing it.
I test it on GroqeSTT — an Electron app for voice transcription with topic-based routing. The codebase is tens of thousands of lines, stack is TypeScript + Electron + PostgreSQL + Railway + Supabase + Groq/Gemini API. The project has a 5-phase roadmap (Phase 0 → Phase 5: Knowledge Mesh + Second Brain), ADRs, dedicated views, UI impact maps — the documentation is solid. And yet a single Claude session without the harness could "normalize" all of it in an hour if allowed to.
The harness doesn't allow it.
Three observations the harness is built on
1. The agent has no memory
Every Claude session starts from zero. What "stuck in its memory" from the last conversation is an illusion — the new session knows nothing beyond CLAUDE.md and whatever it reads itself.
Without awareness of this, the agent pretends to remember: "I think this file has function X", "in the last iteration we agreed on Y". We didn't. That's confabulation.
2. The agent is an optimist
It'll say "should work" as soon as the code compiles. Won't run tests. Won't build the DMG. Won't check if the Supabase migration actually applied. It'll say it's "done" and only manual verification will reveal that something doesn't work.
Verification before closing has to be a rule, not a suggestion.
3. The agent is fast
In two minutes it'll run the migration, change routing, rewrite 4 components, and commit all of it under feat: improvements. Then you can't trace what went wrong. Molecularity — one concern per commit — is the foundation of the audit trail.
But Claude Code already has Plan Mode — why this?
Fair question. I ask myself once a month. Plan Mode solves one specific problem: forcing a plan before the first edit of a session. That's important and this harness uses the same instinct.
But Plan Mode is one gate at the start of a session. It doesn't do many things I need in long-running project work:
| Plan Mode does | Plan Mode doesn't |
|---|---|
| Force a plan before first edit | Force atomic commits (one concern per commit) |
| Block writes until you confirm | Force Read-before-Edit during the session |
| Carry context between sessions (session briefs) | |
| Enforce phase discipline (DoD N before N+1) | |
| Codify decision protocol (single-option vs A/B/C vs narrow-first) | |
Require tool_failed → STOP instead of retry/workaround | |
| Require verification before "done" (ban on "should work") | |
Provide steer vocabulary (plan only, do X and commit, GO, gap: ...) | |
| Leave an audit trail of why-what-when decisions |
Plan Mode is great for the first 10 minutes of a session. The molecular harness is what protects the next 5 months of the project from decaying into entropy.
You can — and should — use both. Plan Mode for initial framing, harness for everything after.
Architecture of the harness
The whole harness is seven files and one taxonomy. All of them in the repo (plus .env.*.local gitignored for creds):
repo/
├── CLAUDE.md # auto-loaded manifesto
├── .claude/
│ ├── rules.md # project-specific gotchas (dev ≠ prod, clean build)
│ └── settings.local.json # hardened permissions allowlist
├── docs/
│ ├── session-briefs/
│ │ ├── next-session-phase2-brief.md # continuity — pasted at session start
│ │ └── scope-per-phase.md # phase map
│ └── architecture/
│ ├── 00-CURRENT-STATE.md
│ ├── 01-VISION.md
│ ├── 02-DATA-MODEL.md
│ └── 05-OPERATING-MODES.md
└── .env.test.local / .env.prod.local # gitignored — creds fallback
Plus an external source of truth (I use a ClickUp "Knowledge Mesh" doc) for: concepts, ADRs, phase specs, status notes, dedicated views. That's not downloadable — it's part of the contract that CLAUDE.md points to something you (not the agent) maintain.
Seven non-negotiable rules
These ended up in CLAUDE.md after successive sessions where something went wrong. Each has a concrete incident behind it.
- Plan before code. 3–5 bullet plan → I wait for GO → I write code. No GO = no edits.
- Verify before closing. "Done" only after build/test/a demonstrably working function, not after "it compiles".
- One concern per commit. Schema / logic / UI → separate commits. Monolithic commit = no audit trail.
- Phase discipline. No Phase N+1 without Phase N's DoD accepted. Even when tempting.
- Read before edit. Every file I edit was read in the current session. Not from memory.
- Tool failed → STOP and diagnose. No retry, no workaround. A tool fail is a signal, not an obstacle.
- Ask only about real decisions. Don't invent fake options when there's one sensible path.
The seventh is the youngest — added today after the conversation I'll describe next.
Three decision paths (the most interesting part)
How should the agent ask for a decision? Instinct first: multiply options to show thoughtfulness. A/B/C for every step, trade-offs laid out, care for "democratic choice". Effect: the user wastes time because B and C are obviously worse than A, and the agent would have recommended A anyway.
Today's fix: diagnose how many real options exist before asking.
Path 1 — one sensible option
If the pipeline shows an obvious next step (e.g. next migration in a numbered series), the agent presents a proposal:
Proposal: add
config.tswithROUTING_V2_ENABLED—z.string().default('false').transform(v => v==='true'||v==='1')(edge-case safe boolean parsing). Impact: test env + prod env accept the new env var; default off. Risk: none (default off, zero touch to v1 routing).See any gaps? If not — GO.
Three lines, not three paragraphs. I wait for NO-GO by default — no answer ≠ GO. I reply: GO, gap: ..., or different: ....
For most steps in the molecular harness this is the right format. Fake A/B/C where B and C are obviously worse costs me 30 seconds per decision. Over 3 hours of work that's 10 minutes wasted on ritual.
Path 2 — 2–3 real options
When there are real options with different implications (different blast radius, different reversibility, different cost), the agent uses the classic 3-line format:
A
hotfix— change Railway env var and redeploy backend. Impact: test env only, prod untouched. Trade-off: ~3 min, reversible.B
merge— mergefeature/knowledge-meshtomain, prod auto-deploy. Impact: PROD live, migration 031_edges runs on prod Supabase. Trade-off: rollback viagit revert+ redeploy; additive migration so safe.C
pause— do nothing. Impact: zero. Trade-off: come back whenever.Recommend A — B is reversible but I want a test verify first.
Three lines per option + one TL;DR. The user knows what they're clicking in 10 seconds.
Path 3 — > 3 options, none dominates
Decision too coarse. Narrow to criterion first ("which matters more: bundle size or API ergonomics?"), then full A/B.
What changes in practice
- The agent has decision-making space. Doesn't ask about cosmetics (commit message, badge color, JSON format).
- The agent always asks about irreversible, high-cost, prod-impact, scope expansion — regardless of path.
- I have fewer interruptions. A single-path proposal costs me
GO— 2 characters, decision in 3 seconds.
Continuity between sessions
The biggest gap in working with Claude Code is handoff. Session A ends, session B starts the next day, and the first 20 minutes are "where did we leave off".
Solution: session brief. A docs/session-briefs/next-session-phase{N}-brief.md file, committed to the repo. The session closing a phase must generate it before ending. The starting session gets it on the first message:
I'm a new Claude session for
<project>. Readdocs/session-briefs/next-session-phase2-brief.mdand act accordingly.
The brief has a fixed structure: reading order (5 min), where-we-are (TL;DR), insights/gotchas, first task, phase rough refs, infra quick-check, harness reminder, sources of truth.
Ten whole pages, but I start in 2 minutes instead of 20.
Trade-offs I'm not afraid of
I'll say it plainly: the harness trades speed for control. Three things become slower.
1. More replying to prompts
The agent asks more often. Decides less alone. The first 2 days are frustrating — it feels like micromanagement. After day 3 you start to recognize that every question is always on point: irreversible, prod-impact, scope expansion, real alternative.
Today's fix (single-option path) further reduces this cost — the agent asks only when there's a real choice.
2. Plans add friction at the start
Instead of "throw the code and see", it's "present a plan, wait for GO". The first plan in a phase takes 2 minutes instead of 0. But every subsequent step goes faster because the plan talks to the agent in the same language (file map + DoD + commit strategy).
3. Molecularity requires more commits
Instead of feat: improvements, I get feat(db): migration 032_pgvector, feat(router): candidates.ts stub, feat(router): rerank.ts stub, feat(api): /suggest-v2 endpoint gated. Four commits instead of one. The audit trail leaves no doubt which commit broke something — that's worth it.
What I get back
What I gain:
- Molecular ADRs. Every decision (Phase 2 routing, Phase 5 reshape Palace → Second Brain, ADR-007 canvas lib, ADR-008 Polish reranker) is a separate document with date, alternatives, consequences. None disappears into commit history.
- Nothing happens without my knowledge. Plan before code + single-path proposal = every edit is a conscious decision. The agent doesn't "improvise" under the guise of "small step, I'll show you soon".
- I have a chance to catch changes along the way. When the agent presents a proposal, I see it before it happens. I can say
gap: you didn't account for Xand we're back to the plan. - Session continuity. The next Claude session doesn't repeat my mistakes — the brief shows them as "gotchas".
What will still change
The harness evolves. Things I'm testing / planning:
- Auto-generate briefs from
git log --since="last session"+ TODO. Today I fill briefs manually. A simple script should pre-generate them, I just verify. - harness-validator agent. A subagent that checks rule compliance at session end: was there Read before every Edit? Are commits atomic? Did the next-session brief get generated?
- Session telemetry. How often
tool_failed → retryvstool_failed → STOP. How many questions were Path 1 (proposal) vs Path 2 (A/B/C). Is the agent breaking rule 7 (fake options)? Without measurement it's hard to improve. - Single-option path in the skills. Today we added the rule to
CLAUDE.mdand the decision-prompt-three-lines prompt. The next 2 weeks will show whether the agent actually multiplies fewer fake variants. - PreToolUse hooks for Edit/Write that check the
readFilesset in the session. A hook blocking Edit without Read. The risk — too aggressive hooks block legit flows, so I rely on discipline for now.
How to grab it and run it yourself
The whole harness lives as a public repo on GitHub — 6 artifacts (3 prompts + 3 skills) plus 3 ready-to-fill example files (CLAUDE.md.example, next-session-phase1-brief.md.example, settings.local.json.example) plus README, CHANGELOG, CONTRIBUTING. MIT license.
→ github.com/yx-aesthete/madesky-claude-harness
Minimal adoption path:
git clone https://github.com/yx-aesthete/madesky-claude-harness.git
cd madesky-claude-harness
cp prompts/molecular-harness-working-mode.md /path/to/your/repo/CLAUDE.md
mkdir -p /path/to/your/repo/docs/session-briefs
cp examples/next-session-phase1-brief.md.example \
/path/to/your/repo/docs/session-briefs/next-session-phase1-brief.md
# fill the brief — 5-10 min, the only manual step
cp examples/settings.local.json.example /path/to/your/repo/.claude/settings.local.json
Then open Claude Code in your repo and paste as the first message:
I'm a new Claude session. Read
docs/session-briefs/next-session-phase1-brief.mdand act accordingly.
Fork, adapt, publish your changes. Each prompt and skill also lives as a standalone page on madejski.ai with "copy raw" and markdown preview — links in the "Related prompts and skills" section at the bottom of this post.
Projects where the harness runs
I test it on several active projects with different risk profiles:
- GroqeSTT — Electron + PostgreSQL + Railway + Supabase + Groq/Gemini. Knowledge Mesh Phase 2 in progress. Where the harness was born.
- Pryzmat — media pipeline Next.js + Supabase. Phase-driven development.
- AvoidSCT — clinical application, highest audit-trail requirements. The harness is crucial here.
- BasePlate — SaaS platform that hosts several smaller products itself. Harness replicated per-workspace.
- AI Possibilities Lab — educational project where the harness also serves as teaching material for "how to run a Claude Code session".
In each project CLAUDE.md starts from the same skeleton. Only the document taxonomy and the entry-point phase brief differ.
Invitation
If you're running a Claude Code project for more than 2 weeks — fork the repo, adapt to your style, and publish your changes. Even better: open an issue with a concrete incident that existing rules didn't catch. Much of what's here came from real mistakes — your mistakes can teach me as much as mine taught me.
- Repo: github.com/yx-aesthete/madesky-claude-harness
- Feedback: LinkedIn or a comment under this post