April 19, 2026

A molecular harness for Claude Code: how I run long-form AI projects

harnessclaude-codeworking-modeworkflowai-engineeringdeveloper-experienceadr

Referenced

PROMPTsystem

Molecular harness — working mode

The core working-mode manifest for Claude Code on long-form projects. Normalized version of the personal harness from the GroqeSTT project — auto-loaded via CLAUDE.md on every session. Contains agent-flaw diagnostics, seven non-negotiable rules, pre/during/post-task checklists, red flags, user steers, and the agent's decision-making space protocol (three paths).

Open

PROMPTwriting

Decision prompt — three paths (proposal / A-B-C options / narrow first)

Three paths for making decisions: (1) one sensible option → proposal + "see any gaps?", (2) 2-3 real options → classic 3-line format + TL;DR, (3) > 3 options → narrow first. Eliminates fake A/B/C and option multiplication beyond need, while preserving discipline for real decisions.

Open

PROMPTsystem

Session start protocol — 5 minutes before any code

Ritual for starting a new Claude Code session. 5 minutes of reading before touching anything — in a set order. Solves the "cold start bias": the agent jumps to code before understanding state. List of sources of truth, reading order, infra checklist.

Open

SKILLworkflow

Phase continuity brief — handoff between sessions

Skill for generating the next-session-phase{N+1}-brief.md file before ending a session. Eliminates the gap between sessions — the next session has ready context, first task, and infra quick-check. Zero momentum loss from "where did we leave off".

Open

SKILLworkflow

Plan-before-code — implementation discipline

Skill enforcing plan-first: a plan presented to the user before any code change. You wait for GO, then write. Exception: batch approval ("do X and commit"). Eliminates "I already started" and unauthorized commits. Two formats: single-path proposal (default for most steps) and plan-with-variants (when there are real options).

Open

SKILLreview

Read-before-edit — gate before file modification

Skill enforcing Read before every Edit/Write in the same session. Eliminates "blind edits" based on memory — every file the agent modifies must have been read in the current session. Protects against overwriting others' work.

Open

GITHUB

yx-aesthete/madesky-claude-harness

Public MIT-licensed repo — 3 prompts + 3 skills + examples. Clone and drop into your project.

GitHub

Why a harness at all

Claude Code out of the box is fast and careless. On a weekend hack that's a feature — you'll ship something in a week. On a project that has to live for months, touch production, have phases and ADRs, the same speed becomes a bug: the agent writes 400 lines when you asked for 40, declares "should work" instead of verifying, starts Phase 2 while Phase 1 is still smoking.

My harness is an attempt to tame that speed without killing it.

I test it on GroqeSTT — an Electron app for voice transcription with topic-based routing. The codebase is tens of thousands of lines, stack is TypeScript + Electron + PostgreSQL + Railway + Supabase + Groq/Gemini API. The project has a 5-phase roadmap (Phase 0 → Phase 5: Knowledge Mesh + Second Brain), ADRs, dedicated views, UI impact maps — the documentation is solid. And yet a single Claude session without the harness could "normalize" all of it in an hour if allowed to.

The harness doesn't allow it.

Three observations the harness is built on

1. The agent has no memory

Every Claude session starts from zero. What "stuck in its memory" from the last conversation is an illusion — the new session knows nothing beyond CLAUDE.md and whatever it reads itself.

Without awareness of this, the agent pretends to remember: "I think this file has function X", "in the last iteration we agreed on Y". We didn't. That's confabulation.

2. The agent is an optimist

It'll say "should work" as soon as the code compiles. Won't run tests. Won't build the DMG. Won't check if the Supabase migration actually applied. It'll say it's "done" and only manual verification will reveal that something doesn't work.

Verification before closing has to be a rule, not a suggestion.

3. The agent is fast

In two minutes it'll run the migration, change routing, rewrite 4 components, and commit all of it under feat: improvements. Then you can't trace what went wrong. Molecularity — one concern per commit — is the foundation of the audit trail.

But Claude Code already has Plan Mode — why this?

Fair question. I ask myself once a month. Plan Mode solves one specific problem: forcing a plan before the first edit of a session. That's important and this harness uses the same instinct.

But Plan Mode is one gate at the start of a session. It doesn't do many things I need in long-running project work:

Plan Mode does	Plan Mode doesn't
Force a plan before first edit	Force atomic commits (one concern per commit)
Block writes until you confirm	Force Read-before-Edit during the session
	Carry context between sessions (session briefs)
	Enforce phase discipline (DoD N before N+1)
	Codify decision protocol (single-option vs A/B/C vs narrow-first)
	Require `tool_failed → STOP` instead of retry/workaround
	Require verification before "done" (ban on "should work")
	Provide steer vocabulary (`plan only`, `do X and commit`, `GO`, `gap: ...`)
	Leave an audit trail of why-what-when decisions

Plan Mode is great for the first 10 minutes of a session. The molecular harness is what protects the next 5 months of the project from decaying into entropy.

You can — and should — use both. Plan Mode for initial framing, harness for everything after.

Architecture of the harness

The whole harness is seven files and one taxonomy. All of them in the repo (plus .env.*.local gitignored for creds):

repo/
├── CLAUDE.md                                 # auto-loaded manifesto
├── .claude/
│   ├── rules.md                              # project-specific gotchas (dev ≠ prod, clean build)
│   └── settings.local.json                   # hardened permissions allowlist
├── docs/
│   ├── session-briefs/
│   │   ├── next-session-phase2-brief.md      # continuity — pasted at session start
│   │   └── scope-per-phase.md                # phase map
│   └── architecture/
│       ├── 00-CURRENT-STATE.md
│       ├── 01-VISION.md
│       ├── 02-DATA-MODEL.md
│       └── 05-OPERATING-MODES.md
└── .env.test.local / .env.prod.local          # gitignored — creds fallback

Plus an external source of truth (I use a ClickUp "Knowledge Mesh" doc) for: concepts, ADRs, phase specs, status notes, dedicated views. That's not downloadable — it's part of the contract that CLAUDE.md points to something you (not the agent) maintain.

Seven non-negotiable rules

These ended up in CLAUDE.md after successive sessions where something went wrong. Each has a concrete incident behind it.

Plan before code. 3–5 bullet plan → I wait for GO → I write code. No GO = no edits.
Verify before closing. "Done" only after build/test/a demonstrably working function, not after "it compiles".
One concern per commit. Schema / logic / UI → separate commits. Monolithic commit = no audit trail.
Phase discipline. No Phase N+1 without Phase N's DoD accepted. Even when tempting.
Read before edit. Every file I edit was read in the current session. Not from memory.
Tool failed → STOP and diagnose. No retry, no workaround. A tool fail is a signal, not an obstacle.
Ask only about real decisions. Don't invent fake options when there's one sensible path.

The seventh is the youngest — added today after the conversation I'll describe next.

Three decision paths (the most interesting part)

How should the agent ask for a decision? Instinct first: multiply options to show thoughtfulness. A/B/C for every step, trade-offs laid out, care for "democratic choice". Effect: the user wastes time because B and C are obviously worse than A, and the agent would have recommended A anyway.

Today's fix: diagnose how many real options exist before asking.

Path 1 — one sensible option

If the pipeline shows an obvious next step (e.g. next migration in a numbered series), the agent presents a proposal:

Proposal: add config.ts with ROUTING_V2_ENABLED — z.string().default('false').transform(v => v==='true'||v==='1') (edge-case safe boolean parsing). Impact: test env + prod env accept the new env var; default off. Risk: none (default off, zero touch to v1 routing).

See any gaps? If not — GO.

Three lines, not three paragraphs. I wait for NO-GO by default — no answer ≠ GO. I reply: GO, gap: ..., or different: ....

For most steps in the molecular harness this is the right format. Fake A/B/C where B and C are obviously worse costs me 30 seconds per decision. Over 3 hours of work that's 10 minutes wasted on ritual.

Path 2 — 2–3 real options

When there are real options with different implications (different blast radius, different reversibility, different cost), the agent uses the classic 3-line format:

A hotfix — change Railway env var and redeploy backend. Impact: test env only, prod untouched. Trade-off: ~3 min, reversible.

B merge — merge feature/knowledge-mesh to main, prod auto-deploy. Impact: PROD live, migration 031_edges runs on prod Supabase. Trade-off: rollback via git revert + redeploy; additive migration so safe.

C pause — do nothing. Impact: zero. Trade-off: come back whenever.

Recommend A — B is reversible but I want a test verify first.

Three lines per option + one TL;DR. The user knows what they're clicking in 10 seconds.

Path 3 — > 3 options, none dominates

Decision too coarse. Narrow to criterion first ("which matters more: bundle size or API ergonomics?"), then full A/B.

What changes in practice

The agent has decision-making space. Doesn't ask about cosmetics (commit message, badge color, JSON format).
The agent always asks about irreversible, high-cost, prod-impact, scope expansion — regardless of path.
I have fewer interruptions. A single-path proposal costs me GO — 2 characters, decision in 3 seconds.

Continuity between sessions

The biggest gap in working with Claude Code is handoff. Session A ends, session B starts the next day, and the first 20 minutes are "where did we leave off".

Solution: session brief. A docs/session-briefs/next-session-phase{N}-brief.md file, committed to the repo. The session closing a phase must generate it before ending. The starting session gets it on the first message:

I'm a new Claude session for <project>. Read docs/session-briefs/next-session-phase2-brief.md and act accordingly.

The brief has a fixed structure: reading order (5 min), where-we-are (TL;DR), insights/gotchas, first task, phase rough refs, infra quick-check, harness reminder, sources of truth.

Ten whole pages, but I start in 2 minutes instead of 20.

Trade-offs I'm not afraid of

I'll say it plainly: the harness trades speed for control. Three things become slower.

1. More replying to prompts

The agent asks more often. Decides less alone. The first 2 days are frustrating — it feels like micromanagement. After day 3 you start to recognize that every question is always on point: irreversible, prod-impact, scope expansion, real alternative.

Today's fix (single-option path) further reduces this cost — the agent asks only when there's a real choice.

2. Plans add friction at the start

Instead of "throw the code and see", it's "present a plan, wait for GO". The first plan in a phase takes 2 minutes instead of 0. But every subsequent step goes faster because the plan talks to the agent in the same language (file map + DoD + commit strategy).

3. Molecularity requires more commits

Instead of feat: improvements, I get feat(db): migration 032_pgvector, feat(router): candidates.ts stub, feat(router): rerank.ts stub, feat(api): /suggest-v2 endpoint gated. Four commits instead of one. The audit trail leaves no doubt which commit broke something — that's worth it.

What I get back

What I gain:

Molecular ADRs. Every decision (Phase 2 routing, Phase 5 reshape Palace → Second Brain, ADR-007 canvas lib, ADR-008 Polish reranker) is a separate document with date, alternatives, consequences. None disappears into commit history.
Nothing happens without my knowledge. Plan before code + single-path proposal = every edit is a conscious decision. The agent doesn't "improvise" under the guise of "small step, I'll show you soon".
I have a chance to catch changes along the way. When the agent presents a proposal, I see it before it happens. I can say gap: you didn't account for X and we're back to the plan.
Session continuity. The next Claude session doesn't repeat my mistakes — the brief shows them as "gotchas".

What will still change

The harness evolves. Things I'm testing / planning:

Auto-generate briefs from git log --since="last session" + TODO. Today I fill briefs manually. A simple script should pre-generate them, I just verify.
harness-validator agent. A subagent that checks rule compliance at session end: was there Read before every Edit? Are commits atomic? Did the next-session brief get generated?
Session telemetry. How often tool_failed → retry vs tool_failed → STOP. How many questions were Path 1 (proposal) vs Path 2 (A/B/C). Is the agent breaking rule 7 (fake options)? Without measurement it's hard to improve.
Single-option path in the skills. Today we added the rule to CLAUDE.md and the decision-prompt-three-lines prompt. The next 2 weeks will show whether the agent actually multiplies fewer fake variants.
PreToolUse hooks for Edit/Write that check the readFiles set in the session. A hook blocking Edit without Read. The risk — too aggressive hooks block legit flows, so I rely on discipline for now.

How to grab it and run it yourself

The whole harness lives as a public repo on GitHub — 6 artifacts (3 prompts + 3 skills) plus 3 ready-to-fill example files (CLAUDE.md.example, next-session-phase1-brief.md.example, settings.local.json.example) plus README, CHANGELOG, CONTRIBUTING. MIT license.

→ github.com/yx-aesthete/madesky-claude-harness

Minimal adoption path:

git clone https://github.com/yx-aesthete/madesky-claude-harness.git
cd madesky-claude-harness
cp prompts/molecular-harness-working-mode.md /path/to/your/repo/CLAUDE.md
mkdir -p /path/to/your/repo/docs/session-briefs
cp examples/next-session-phase1-brief.md.example \
   /path/to/your/repo/docs/session-briefs/next-session-phase1-brief.md
# fill the brief — 5-10 min, the only manual step
cp examples/settings.local.json.example /path/to/your/repo/.claude/settings.local.json

Then open Claude Code in your repo and paste as the first message:

I'm a new Claude session. Read docs/session-briefs/next-session-phase1-brief.md and act accordingly.

Fork, adapt, publish your changes. Each prompt and skill also lives as a standalone page on madejski.ai with "copy raw" and markdown preview — links in the "Related prompts and skills" section at the bottom of this post.

Projects where the harness runs

I test it on several active projects with different risk profiles:

GroqeSTT — Electron + PostgreSQL + Railway + Supabase + Groq/Gemini. Knowledge Mesh Phase 2 in progress. Where the harness was born.
Pryzmat — media pipeline Next.js + Supabase. Phase-driven development.
AvoidSCT — clinical application, highest audit-trail requirements. The harness is crucial here.
BasePlate — SaaS platform that hosts several smaller products itself. Harness replicated per-workspace.
AI Possibilities Lab — educational project where the harness also serves as teaching material for "how to run a Claude Code session".

In each project CLAUDE.md starts from the same skeleton. Only the document taxonomy and the entry-point phase brief differ.

Invitation

If you're running a Claude Code project for more than 2 weeks — fork the repo, adapt to your style, and publish your changes. Even better: open an issue with a concrete incident that existing rules didn't catch. Much of what's here came from real mistakes — your mistakes can teach me as much as mine taught me.

Repo: github.com/yx-aesthete/madesky-claude-harness
Feedback: LinkedIn or a comment under this post