How I structure a day around an AI coding agent. What I delegate, what I still write by hand, and the prompts that earn their keep.

/ 01The shape of a day with the agent in it.

I open the laptop and the first thing on the screen is yesterday's todo, not a chat window. Mornings are for the work I'd be ashamed to hand off. The data model. The architecture call. The load-bearing line of the prompt that ships in production. Afternoons are for the agent. By then the contract is clear enough that a three-line task description gets me code I can keep.

/ 02What I delegate, and the test I use to decide.

Boilerplate Flutter widgets. Fixture data. Migrations. Test cases two through ten once I've written test one with intent. Mechanical refactors. Renaming across forty files. Threading a new parameter through a service layer. The agent runs these at noon-tired pace without grumbling.

The test: if a competent junior would do this without learning anything new about the domain, the agent does it. If understanding the domain is half the task, I do it.

/ 03What I still write by hand.

The first failing test. The agent writes tests two through n correctly only after I write number one with intent. Anything that touches money, auth, or PHI. I read every line; I write the structural one. Prompts that ship to users. A prompt is a contract with a model that costs me money on every call, and I don't draft contracts on autopilot.

AI tooling is a multiplier on a senior engineer, not a replacement. The multiplier is two to three, not ten, and it only shows up if you still own the bugs.

/ 04The four prompts that earn their keep.

  • "Here's the failing test, here's the file, write the smallest change that passes." Forces the agent to stay near the diff instead of redesigning the module.
  • "Read these three files and tell me which one owns this concept." Faster than my own grep on a repo I haven't opened in six weeks.
  • "Generate twenty edge-case inputs for this function, then write a test for the three that look like real bugs." The agent finds the off-by-one I would have shipped.
  • "Explain what this PR changes in one paragraph for someone who hasn't seen the repo." Paste it into the PR description, ship.

/ 05Where the agent breaks, and what I do about it.

It happily writes code that compiles and is wrong. The test suite catches most of that on the app side; the eval suite catches the rest on the AI side. It forgets context that's three messages back. I keep one running file of project conventions and paste it on every cold session. It hallucinates package APIs. When the import line looks plausible but unfamiliar, I check the docs before I trust the call.


/ 06The 2 to 3x claim, in numbers.

Across the last quarter I shipped what used to take me three sprints in one. That's the multiplier. It is not because the agent writes code faster than I do. It is because I no longer wait until tomorrow morning to start the boring thing, and the boring things used to eat the day.

If you're a founder trying to decide whether one engineer who uses these tools well beats two who don't, the honest answer is the first one. Book a call if you want to talk through the setup.