AI capacity • Agent swarms • Operating models

From 30 Minutes to Two Weeks: The Agent Explosion Nobody Budgeted For

Triont • AI capacity, cost control & operating models • Melbourne, Australia

A year ago, the best AI coding agents could sustain about 30 minutes of autonomous work before losing context. By mid-2025, Rakuten managed 7 hours of continuous autonomous coding with Claude — a result that set the benchmark at the time.

That was six months ago.

In February 2026, 16 Claude Opus 4.6 agents built a fully functional C compiler — over 100,000 lines of Rust — working autonomously for two weeks straight. No human writing code. The compiler builds the Linux kernel on three architectures, passes 99% of a compiler torture test suite, and compiles PostgreSQL. Total cost: about $20,000.

That's a 672x increase in sustained autonomous work in twelve months. The pace of improvement has moved well past incremental.

The Number That Actually Matters

The headline feature of Opus 4.6 is a 5x expansion in context window — from 200,000 tokens to a million. That's what Anthropic put in the press release. It's the wrong number to focus on.

The number that matters is retrieval accuracy inside that window. Every major model could accept large context windows in January 2026. The question was whether the model could actually find and use what you put in there.

Sonnet 4.5, a strong model from just a few months ago, has a million-token window. Its ability to retrieve specific information from within that window? About 18.5%. Roughly one chance in five. Gemini 3 Pro was a bit better at 26.3%. These were the best available in January. They could hold your codebase. They couldn't reliably read it.

Opus 4.6 retrieves accurately 76% of the time at a million tokens. At a quarter of the window — 256,000 tokens — that rises to 93%.

That's the difference between a model that can hold 50,000 lines of code and a model that can hold 50,000 lines of code and know what's on every line simultaneously. Every import, every dependency, every interaction between modules — all visible at once.

If you've worked with senior engineers, you know this distinction intuitively. The difference between someone who just started reading a codebase and someone who's lived in it for months isn't what they can look up. It's what they carry in their head — knowing that changing the auth module will break the session handler, knowing the rate limiter shares state with the load balancer. That holistic awareness is often what separates a senior engineer from a contractor reading the code for the first time.

Opus 4.6 does this for 50,000 lines simultaneously. Not by summarising. Not by searching. It just holds the context and reasons across it.

AI Isn't Just Writing Code Now — It's Managing Engineers

The C compiler is impressive, but the Rakuten story matters more for most organisations.

Rakuten — the Japanese e-commerce and fintech conglomerate — deployed Claude Code across their engineering org. Not as a pilot. In production, touching real code that ships to real users.

When they pointed Opus 4.6 at their issue tracker, it closed 13 issues autonomously in a single day. It assigned 12 issues to the right team members across a 50-person org spanning six code repositories. And it knew when to escalate to a human.

I want to be specific about what happened here. The AI didn't help engineers close tickets. It closed tickets itself — doing the work of an individual contributor. But it also routed work correctly across the org. It understood which team owns which repo, which engineer has context on which subsystem, what it could close versus what needed human judgment.

That goes beyond code generation. That's organisational reasoning — understanding how work flows through a team, not just how code flows through a compiler.

The coordination function that engineering managers spend half their time on — ticket triage, work routing, dependency tracking, cross-team handoffs — just became automatable. Not in theory. In production, at a company with real scale.

And Rakuten isn't stopping there. They're building an ambient agent system that breaks complex tasks into 24 parallel Claude Code sessions, each handling a different slice of their monorepo. They're estimating that a month of human engineering work can be generated across simultaneously running agent streams.

Here's the detail that gets buried under the big numbers: non-technical employees at Rakuten are contributing to development through the Claude Code interface. People who have never written code are shipping features.

The Side Demo That Should Worry Every Security Team

On the same day Opus 4.6 launched, Anthropic published a result that got far less attention than the C compiler but might matter more long-term.

They gave Opus 4.6 basic tools — Python, debuggers, fuzzers — and pointed it at an open-source codebase. No specific vulnerability hunting instructions. No curated targets. Just: here's some tools, here's some code, find the problems.

It found over 500 previously unknown high-severity zero-day vulnerabilities. In code that had been reviewed by human security researchers, scanned by existing automated tools, and deployed in production systems used by millions of people.

When traditional fuzzing and manual analysis hit dead ends, the model independently decided to go through the project's git history — years of commit logs — to understand how the codebase had evolved. It identified areas where security-relevant changes had been made hastily or incompletely. Nobody told it to do this. It invented a detection methodology on its own.

That's what happens when reasoning meets working memory at this scale. The model doesn't scan for known patterns the way existing tools do. It builds a mental model of how code works, how data flows, where trust boundaries exist, and where assumptions might break. Then it probes the weak spots — systematically, and without getting bored three hours into a commit log review.

Five hundred zero-days was the side demonstration — not even the headline feature of the release.

The Token Explosion Is Here

I wrote recently about how AI costs behave like infrastructure, not SaaS — that the $40-per-month mental model breaks the moment AI moves from experimentation into real work.

The agent swarm model makes that argument look conservative.

Think about what 16 agents building a C compiler for two weeks actually means in token consumption. Each agent is operating in its own million-token context window. They're coordinating through a shared task system. They're reading, reasoning, generating code, running tests, reading the results, and iterating — continuously, around the clock, for 14 days.

That's a fundamentally different consumption profile to someone asking a chatbot a question once or twice a day.

And it's the direction everything is heading. Rakuten's 24 parallel agent sessions. McKinsey — the company that sells organisational design to every Fortune 500 on Earth — targeting parity between AI agents and human workers across their firm by end of 2026. Cursor, the AI coding tool, running autonomous agent swarms that independently organise themselves into hierarchical structures.

The $650 billion in hyperscale data centre infrastructure currently being built globally isn't being built for chatbots. It's being built for agent swarms running at a scale that most organisations haven't begun to model.

Locally, Victoria's 720MW Latrobe Valley data centre project starts to make a lot more sense when you picture thousands of agent sessions running continuously across enterprise engineering orgs. This is the demand profile that infrastructure is being built for.

For any business planning AI adoption, the cost model you need isn't "how many seats at what price." It's "how many agents, running how often, consuming how many tokens, at what tier of model." That's infrastructure planning. That's capacity management. And most organisations aren't doing it yet because the mental model hasn't caught up with the reality.

The Org Chart Is Flipping

The revenue-per-employee numbers at AI-native companies are worth paying attention to.

Cursor hit $100 million in annual recurring revenue with about 20 people. That's $5 million per employee. Midjourney generated $200 million with about 40 people. Lovable, the AI app builder, reached $200 million in 8 months with 15 people.

For traditional SaaS companies, $300,000 in revenue per employee is considered excellent. $600,000 is elite — that's Notion territory. AI-native companies are running at five to seven times those numbers. Not because they found better people, but because their people orchestrate agents instead of doing execution themselves.

The emerging model at startups is two to three humans plus a fleet of specialised agents, organised not by function but by outcome. The humans set direction, evaluate quality, and make judgment calls. The agents execute, coordinate, and scale.

Three developers in London built a complete business banking platform in six months — a project that would have required 20 engineers and 18 months before AI. Jacob Bank runs a million-dollar marketing operation with zero employees and roughly 40 AI agents.

Amazon's two-pizza team formula — the idea that no team should be larger than what two pizzas can feed — is evolving into something smaller. The org chart stops being a hierarchy of people and becomes a map of human-agent teams, each owning a complete workflow end to end.

Dario Amodei, Anthropic's CEO, puts the odds of a billion-dollar solo-founded company emerging by end of 2026 at 70-80%.

What This Means If You're Not Rakuten

Most of the businesses I work with aren't Japanese tech conglomerates. They're Australian SMEs trying to figure out where AI fits without blowing their budget or betting on the wrong architecture.

Here's what I'd say to them right now:

The planning horizon just collapsed. If your AI strategy was built around assumptions from even six months ago, it's already wrong. The tools that existed in January are a different generation from what shipped in February. Your mental model of what AI can and cannot do needs updating monthly now, not annually.

Token consumption will surprise you. When teams move from asking questions to running agents, consumption doesn't increase linearly — it explodes. The $50-100 per person per day figures I've seen in real implementations? That's with current-generation tools. Agent swarms will push those numbers further. Budget accordingly, or you'll get caught mid-deployment with costs you didn't plan for.

The infrastructure question is unavoidable. You can't run sustained agent workloads on per-seat SaaS pricing and stay cost-effective. At some point — and that point is approaching faster than most organisations expect — you need to think about AI the way you think about cloud infrastructure: provisioned capacity, tiered models for different workloads, cost controls that match actual usage patterns rather than arbitrary seat counts.

The skill that matters most has changed. The leverage has shifted from execution to judgment. Whether you write code or not, the bottleneck isn't technical proficiency anymore. It's clarity of intent — knowing what you want built, being able to articulate the real requirement, and having the domain expertise to evaluate whether the output is actually good.

Those skills now have 100x leverage because they're multiplied by every agent you can direct.

The Question Has Changed

For organisational leaders, the question isn't "should we adopt AI" or even "which teams adopt it first." The question is: what's our agent-to-human ratio, and what does each human need to be excellent at to make that ratio work?

And underneath that, a harder question: how do we support our people through a transition that's moving faster than anyone can comfortably absorb?

The agents are here. They work. They coordinate in teams, resolve dependencies, and deliver at a level that didn't exist eight weeks ago. The pace from hours to two weeks took three months — and there's no indication it's slowing down.

The organisations that figure out the new ratio first — humans plus agents, with the right infrastructure underneath — are going to outrun everyone still planning around headcount alone.

Tim Fraser is the founder of Triont, an AI infrastructure consultancy helping Australian SMEs adopt AI through practical capacity planning and cost control. His previous article, "AI Isn't $40 a Month — It's Why They're Building Power Stations," explored why AI costs behave like infrastructure rather than SaaS.

Interested in AI capacity planning?

Let's talk about what your organisation actually needs.

Get in touch →