Backtesting Arena

Backtesting Arena

Back to blog

You're Building with AI Coding Agents. Your Codebase Is Drifting. Here's Why — and the Fix.

We wrote 500 lines of rules. Claude read them — and still wrote text-[13px] instead of text-xs. The mistake wasn't Claude. The mistake was confusing documentation with enforcement.

Backtesting Arena·June 10, 2026·4 min read·0 views
You're Building with AI Coding Agents. Your Codebase Is Drifting. Here's Why — and the Fix.

We're building a SaaS dashboard with ~80 pages, using Claude Code as our primary coding agent.

Every new page Claude built was technically "inspired by" an existing one. Claude read existing files before writing new ones. We had component libraries, shared utilities, a clear folder structure.

And yet. After any long session, the diff looked like this:

  • text-[13px] instead of text-xs
  • #f97316 hardcoded directly in JSX instead of var(--color-brand)
  • The watermark component recreated inline instead of using the shared helper

Not bugs. Not functional failures. Just quiet inconsistency accumulating across hundreds of files.

The obvious solution — and why it fails

We did what every sensible team does: we wrote it down.

STANDARDS.md. 500 lines. Organized by section. Rules with rationale. Examples of correct and incorrect patterns. Cross-references. A typography scale. A color token reference. Explicit component ownership.

At the start of every session: "Read STANDARDS.md before editing any component."

Claude read it. Followed the rules — at the start of a session, when context was fresh.

Then a long session would run. The context would fill with code, diffs, error messages, back-and-forth. And gradually, the standards drifted back out.

STANDARDS.md became what we now call documentation theater — the appearance of discipline without the substance.

Why documentation alone fails for AI agents

The problem isn't Claude's attention span, and it isn't a prompt engineering failure.

The problem is structural.

A rule that lives in a .md file requires three things to go right simultaneously:

  1. Claude reads the relevant section at the right moment
  2. Claude applies it correctly — under token pressure, when the session context is dense
  3. No context drift occurs — across a long, multi-file session

Three probabilistic failure modes stacked on top of each other. Of course it breaks.

Human developers internalize rules over time. A senior dev who's been in a codebase for months doesn't need to re-read the style guide before every file change. The rules are instinct.

AI coding agents start every session cold. Every session is day one. Documentation has to be re-read, re-applied, re-prioritized from scratch — every time.

The real solution: make violations impossible

We stopped trying to make Claude remember the rules.

We wrote a program that enforces them mechanically.

Layer 1 — Pre-Edit Hook

A script registered in Claude Code's settings that fires before every file save. It inspects the planned file content for a list of forbidden patterns:

  • text-[Npx] anywhere in a component → "Use the 6-step type scale: text-xs, text-sm, …"
  • watermarkBottom={…} as a prop → "Use ChartWatermark.tsx. Direct prop is deprecated."
  • Hardcoded hex values in JSX → "Reference CSS variables from globals.css."

If a violation is detected: the edit is blocked. Claude receives a specific error message identifying the exact pattern and the correct alternative.

No general reminder. No "remember the rules." A precise, immediate error — attached to the exact violation.

Layer 2 — Commit Gate

Before any git push, npm run verify:standards runs 22 automated checks across the full codebase:

  • Typography: no text-[Npx] patterns anywhere
  • Color tokens: no hardcoded hex values in component files
  • Watermark: every chart uses ChartWatermark.tsx, not an inline recreation
  • Component imports: shared utilities from the correct path, not duplicated
  • Strategy coverage: every new strategy registered in all required touchpoints

Red on any check = no push. This isn't a linting suggestion. It's a hard gate.

What changed

Before: Claude writes text-[13px] in a new component. It might surface in code review if someone catches it. Often it doesn't. Inconsistency accumulates silently.

After: Claude writes text-[13px]. The pre-edit hook fires in the same turn. Claude reads the error message, understands the exact fix, and self-corrects — before the file is ever saved.

The correction happens at the moment of violation, with full context about what was wrong. Faster than code review, cheaper than a second pass, consistent across every session regardless of context length.

The broader principle

Documentation says: don't do X.
Code says: you can't do X.

For AI agents, that distinction matters more than it does for humans — because agents don't build intuition between sessions. Every enforcement mechanism that relies on memory or intention degrades over time. Every enforcement mechanism that relies on code does not.

This isn't specific to Claude Code. It applies to Cursor, Copilot Workspace, any agentic coding workflow where an AI agent is making file changes in a shared codebase. The principle is the same: if you want a constraint to hold, make it mechanical.

Write a linter. Write a pre-edit hook. Write a commit gate.

Make violations impossible, not inadvisable.


Standards enforced by documentation = theater.
Standards enforced by code = reality.

Try it yourself

Run the backtest with your own parameters and time ranges.

Run backtest →
📬

Don't miss new blog posts

One short email per new post — strategies, backtests, market analysis. No spam, unsubscribe with one click anytime.

By subscribing you accept our privacy policy. We use Resend for delivery. Double opt-in confirmation required.

Comments (0)

Join free to post comments.

Sign up →

No comments yet. Be the first!