We Broke the Code Assembly Line — and Nobody's Quite Sure What Comes Next

---

Something happened at a financial services company a few weeks ago that crystallises an awkward truth about where software engineering finds itself in mid-2026.

They brought in Cursor, the AI coding tool. Inside a single month, the firm went from producing 25,000 lines of code to 250,000. A tenfold increase. The engineering leadership probably popped champagne. Then they counted the review queue: one million lines of unreviewed code piling up like shipping containers at a port. Security vulnerabilities multiplied in step. Departments downstream — sales, marketing, support — got dragged into the acceleration whether they wanted it or not. As Joni Klippert, CEO of StackHawk (who was consulting with the firm), told The New York Times: "The sheer amount of code being delivered, and the increase in vulnerabilities, is something they can't keep up with."

That anecdote, buried in a Times piece aptly headlined "The Big Bang: A.I. Has Created a Code Overload," is the story of 2026 in miniature. The tools work. They work too well. And the systems built to manage human-speed output are gasping for air.

The Numbers Are Getting Absurd

Let me walk you through what the data is telling us, because it's genuinely startling.

The 2026 Stanford AI Index, published just this week, flags the two steepest capability curves on its entire chart: OSWorld (autonomous computer use) and SWE-Bench Verified (autonomous software engineering). These aren't incremental improvements. They're the kind of near-vertical lines that make researchers double-check their spreadsheets. Models are now scoring above 67% on SWE-Bench Pro — MiniMax's M2.7 hit 67.4%, with a one-million-token context window and the ability to run autonomously for six to eight hours at a stretch. Think about that: an AI agent that can sit with a codebase for an entire workday, reasoning its way through bugs, refactors, and feature implementations without a human feeding it prompts every five minutes.

Meanwhile, the enterprise adoption numbers have gone from "interesting pilot" to "battle station operational." The OutSystems 2026 State of AI Development report, which surveyed 1,900 global IT leaders, found that 96% of organisations are already using AI agents in some capacity. Gartner is projecting that 40% of enterprise applications will feature task-specific AI agents by year-end — up from under 5% just last year.

And a MIT Technology Review survey of 300 engineering executives adds the sharpest datum yet: 51% of software teams already have agentic AI in limited use, and 98% of respondents expect AI to accelerate their delivery from pilot to production, with an average anticipated speed-up of 37%.

Let me put that in plain English. Half of all software teams have already handed some portion of their workflow to autonomous AI agents. Almost all of them expect things to speed up by roughly a third. And the remaining half? They're not sitting around waiting — 45% plan to adopt within twelve months.

The Third Seismic Shift

MIT Technology Review frames this as the third great disruption in modern software engineering. First came open source, which democratised access to code. Then came DevOps and agile, which transformed how teams build and ship together. Now comes agentic AI — and it's the only one of the three that doesn't primarily change how humans work together. It changes how much work humans need to do at all.

That distinction matters more than people realise.

With open source and DevOps, the human was still at the centre of every decision. The tools amplified coordination. Agentic AI amplifies output. When a single engineer can describe a feature in natural language and an agent goes off to implement it, write tests, review its own work against a specification, and open a pull request — the bottleneck migrates. It moves from "how fast can we write code?" to "how fast can we decide what code should exist, verify that it's correct, and integrate it into something coherent?"

That's the code overload problem in a sentence. We've solved generation. We haven't solved governance.

The OutSystems report makes this explicit: 94% of organisations are worried about AI sprawl increasing complexity, technical debt, and security risk. Yet only a small fraction have established centralised governance. Most are running agents across fragmented environments with no unified oversight. It's the DevOps story all over again — teams adopt the shiny tool first, figure out the organisational change later, and spend eighteen months paying down the mess.

The Pragmatic Optimist's Playbook

Here's where I land on all of this, and I say this as someone who finds the trajectory genuinely exciting:

The companies that will win the agentic era aren't the ones that adopt fastest. They're the ones that build the review, verification, and governance muscle in parallel with adoption. The financial services firm in the Times story didn't fail because AI wrote bad code. It failed because its human processes were calibrated for a world where code arrived at walking speed and suddenly it showed up in a Ferrari.

Some early signals look promising. The MIT survey notes that 52% of organisations have already settled on a "human-on-the-loop" model — letting agents operate with reduced direct oversight but keeping supervisory control. That's the right instinct. The Epsilla framework for Open Quality Prompts (OQP) — bringing rigorous verification gates into agentic workflows — is exactly the kind of plumbing work that sounds boring until you realise it's the difference between a codebase you can trust and one you can't.

The 41% of organisations aiming for full agent-managed development and product lifecycles within eighteen months? Ambitious. Maybe too ambitious. But the 72% targeting it within two years? That feels more grounded, especially given the trajectory of the benchmark numbers.

---

The software industry has spent seventy years building tools to make humans more productive at writing code. Sometime in the last twelve months, we crossed a threshold where the tools became more productive than the humans overseeing them. The challenge ahead isn't making AI write more code. It's building the systems, habits, and institutions to decide what code is worth having — and to verify it after it arrives.

The big bang already happened. Now comes the hard part: building something coherent out of all that light.