Claude Code in my daily loop: 2 to 3x output, mostly by getting out of its way.

How I structure a day around an AI coding agent. What I delegate, what I still write by hand, and the prompts that earn their keep.

/ 01The shape of a day with the agent in it.

I open the laptop and the first thing on the screen is yesterday's todo, not a chat window. Mornings are for the work I'd be ashamed to hand off. The data model. The architecture call. The load-bearing line of the prompt that ships in production. Afternoons are for the agent. By then the contract is clear enough that a three-line task description gets me code I can keep.

/ 02What I delegate, and the test I use to decide.

Boilerplate Flutter widgets. Fixture data. Migrations. Test cases two through ten once I've written test one with intent. Mechanical refactors. Renaming across forty files. Threading a new parameter through a service layer. The agent runs these at noon-tired pace without grumbling.

The test: if a competent junior would do this without learning anything new about the domain, the agent does it. If understanding the domain is half the task, I do it.

/ 03What I still write by hand.

The first failing test. The agent writes tests two through n correctly only after I write number one with intent. Anything that touches money, auth, or PHI. I read every line; I write the structural one. Prompts that ship to users. A prompt is a contract with a model that costs me money on every call, and I don't draft contracts on autopilot.

AI tooling is a multiplier on a senior engineer, not a replacement. The multiplier is two to three, not ten, and it only shows up if you still own the bugs.

/ 04The four prompts that earn their keep.

"Here's the failing test, here's the file, write the smallest change that passes." Forces the agent to stay near the diff instead of redesigning the module.
"Read these three files and tell me which one owns this concept." Faster than my own grep on a repo I haven't opened in six weeks.
"Generate twenty edge-case inputs for this function, then write a test for the three that look like real bugs." The agent finds the off-by-one I would have shipped.
"Explain what this PR changes in one paragraph for someone who hasn't seen the repo." Paste it into the PR description, ship.

/ 05Where the agent breaks, and what I do about it.

It happily writes code that compiles and is wrong. The test suite catches most of that on the app side; the eval suite catches the rest on the AI side. It forgets context that's three messages back. I keep one running file of project conventions and paste it on every cold session. It hallucinates package APIs. When the import line looks plausible but unfamiliar, I check the docs before I trust the call.

/ 06The 2 to 3x claim, in numbers.

Across the last quarter I shipped what used to take me three sprints in one. That's the multiplier. It is not because the agent writes code faster than I do. It is because I no longer wait until tomorrow morning to start the boring thing, and the boring things used to eat the day.

If you're a founder trying to decide whether one engineer who uses these tools well beats two who don't, the honest answer is the first one. Book a call if you want to talk through the setup.