Skip to content

AI in software development works best with a human in the loop

Team of developers collaborating around a desk with laptops

I let Claude Code write about 40% of the new code on one of our projects over the past three months. It works. But not the way most people think.

The expectation was straightforward: I describe tasks, the agent writes code, I review and merge. Like managing a fast junior developer. The reality is different. I spend just as much time at the keyboard. The work shifted. Less typing, more reading. Less mechanical effort, more thinking about whether what the agent proposed actually makes sense given context it cannot see.

The tools and what I actually use

I have tried most of what is available. Cursor, Claude Code, GitHub Copilot, OpenAI Codex. They are all decent, and they are all different in ways that matter depending on how you work.

I use Claude Code for anything substantial: rewriting services, building test suites, refactoring layers of a project. The terminal-based interaction suits how I think about larger changes. Cursor is better when I want to stay inside my editor and make smaller, localized edits. Copilot I keep around for autocomplete and nothing else. Codex I tried on two projects, went back to Claude Code. The outputs were fine but the interaction model did not click for me.

None of this is a recommendation. Every team, every person, finds their own fit. The interesting part is not which tool you pick. It is what happens after.

What companies are actually reporting

The Vercel team behind Turborepo published their experience with AI agents in development. Fully autonomous loops (agent writes code without supervision) did not produce production-quality results. Their agents generated code that passed tests but contained logic errors that an experienced developer would have caught by instinct. The code looked right. It was not right.

Shopify has talked publicly about using AI for internal tooling. Google uses it for test generation at scale. Microsoft ships Copilot and presumably eats its own cooking. The pattern across all of them is the same: AI generates, humans verify. Nobody with production systems at stake is running agent output straight to deployment.

And yet the LinkedIn discourse keeps cycling through "AI replaced my team" posts. These are either about MVPs, side projects, or fiction.

The failure that taught me the most

On a client project, I had the agent redesign a database layer. About 1,200 lines. The agent produced clean code in twenty minutes. Tests passed. Variable naming was more consistent than what the humans had written before. I was impressed.

During manual review (line by line, because that is how I review agent output now) I found a scenario where records would be deleted that should have been retained. The test suite did not cover it. The agent had not included that test because, from its perspective, the scenario was not evident in the codebase. The information lived in a client email from three months earlier.

An agent is not a junior developer who asks questions. It fills the silence with plausible code.

That single experience changed how I work with these tools more than any blog post or conference talk. The question stopped being "can the agent write this code" and became "what does the agent not know that I do."

Where the industry seems to be heading

A year ago, the conversation was about whether AI could write code at all. Now nobody serious questions that. The conversation has moved to supervision models. How much human oversight does AI-generated code need? When can you trust it and when can you not?

From what I see across our own work and what other teams report:

Mechanical code (CRUD endpoints, migrations, type definitions, boilerplate components) the agent handles well and the review overhead is minimal. The time savings here are real. A 3-4 hour task becomes 1.5-2 hours. Not because the agent does everything, but because it does the parts that do not require judgment.

Architecture, security, business logic with unwritten rules. These still need a human who understands the problem domain, not just the code. No model I have used can reason about what is not written down. And the most important context in any business system is usually the stuff nobody wrote down.

The uncomfortable question about junior developers

Something that comes up in conversations with other CTOs and I do not have a good answer for: if junior developers learn to code by working with agents from day one, do they learn to write code or do they learn to read and approve code? Those are different skills. Reading code someone else wrote is easier than writing it from scratch. The muscle memory of solving problems yourself is hard to replace.

I do not know what this means for the profession in five years. I know that right now, the developers on my team who are best at working with agents are the ones who were already strong without them.

The honest version

AI coding agents make my team faster on mechanical work. They do not make us smarter. They are dangerous exactly where they seem most confident, on tasks that require context they do not have. Our process works because review catches what the agent misses, and that review is slow and human on purpose.

Whether this is a transitional phase or the permanent shape of AI-assisted development, I genuinely do not know.

If you are trying to figure out where AI fits in your development process, let's talk.

AI in Software Development: Why Human-in-the-Loop Wins | Rise.sk | Rise.sk