The Model Is Only 10%: The Real Lesson of the New SDLC

📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper from Google highlights that in AI-assisted development, the model itself is only 10% of the system. The majority of behavior depends on harness design and context management, shifting focus from models to configuration and verification.

A Google whitepaper released in early 2026 states that the AI model used in software development accounts for only about 10% of system behavior. The report emphasizes that harness design and context engineering are the primary drivers of performance and reliability, shifting the industry’s focus away from the models themselves.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, challenges the common perception that the latest AI models are the main determinant of system quality. It cites concrete experiments, such as a coding agent that improved its performance by only changing the harness — not the underlying model — demonstrating that configuration and scaffolding account for roughly 90% of behavior.

According to the authors, the term “vibe coding” has been overused to describe casual AI prompts, but the disciplined approach—called agentic engineering—integrates formal specifications, automated tests, and oversight, making the system more reliable and cost-effective. They argue that the real skill lies in designing the harness and managing context, which includes instructions, knowledge, tools, and guardrails, rather than just prompting the model.

The paper also discusses the economics of AI development, highlighting that ad-hoc prompting appears cheap initially but incurs high long-term costs due to token inefficiencies, maintenance, and security risks. Conversely, investing in structured harnesses and context management reduces marginal costs and improves system robustness.

At a glance
reportWhen: published early 2026
The developmentThe new Google whitepaper reveals that in AI-driven SDLC, the model is only 10% of system performance; the key lies in harness and context engineering.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding has major implications for how organizations approach AI integration. Instead of chasing the latest models, companies should invest in building flexible, well-structured harnesses and mastering context engineering. This approach promises better performance, lower costs, and more secure systems, which are crucial as AI becomes central to software development.

For technical leaders, the message is clear: your competitive advantage lies in configuration, verification, and system design, not just in accessing cutting-edge models. The emphasis on harness and context design could redefine best practices across industries adopting AI tools.

Amazon

software testing automation tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI in Software Engineering

Since early 2026, AI-assisted coding has become widespread, with 85% of developers using AI tools regularly, and over 41% generating most of their code with AI. Industry discussions have largely focused on model improvements, but this whitepaper shifts the narrative, emphasizing that the dominant factor in system behavior is the surrounding scaffolding and context management.

Previous efforts to improve AI performance focused on model size and training data; now, the focus is on how models are integrated into workflows, with experiments showing that small tweaks to prompts and configurations can outperform larger model upgrades.

“The behavior you experience in AI systems is dominated by how you scaffold and configure the system, not just the model itself.”

— Addy Osmani

Amazon

AI system verification software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Model Versus Harness Impact

While experiments demonstrate the outsized role of harness and configuration, the precise limits of model influence in complex, real-world applications remain unclear. It is not yet confirmed how these findings scale across different domains or with future model improvements, and ongoing research is needed to quantify the exact contribution of models versus harness design in various contexts.

Amazon

structured AI harness design tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI System Optimization

Organizations are expected to prioritize developing robust harnesses, improving context management, and investing in verification tools. Future research and industry practices will likely focus on standardizing best practices for harness design, and benchmarking how different configurations impact performance and cost. Monitoring how these strategies evolve will be key as AI models continue to advance.

Amazon

context management software for AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system behavior?

The whitepaper shows that most of the AI system’s behavior depends on how it is configured, scaffolded, and managed through context and verification, not just the underlying model.

How does this change current AI development practices?

It shifts focus from chasing the latest models to building better harnesses, testing frameworks, and context management strategies, which are more cost-effective and reliable.

What are the economic implications of this insight?

Investing in configuration and verification reduces long-term costs by minimizing token waste, maintenance, and security vulnerabilities, despite higher upfront design costs.

Does this mean model improvements are no longer important?

Not necessarily; models will continue to improve, but the whitepaper emphasizes that the system’s behavior is primarily driven by how models are integrated and managed.

What should organizations do next?

Focus on developing and refining harnesses, context engineering, and verification processes to maximize AI system performance and cost efficiency.

Source: ThorstenMeyerAI.com

You May Also Like

Why I’m Forced to Say Farewell: Google Management Has Lost Its Moral Compass

A senior Google security leader resigns, citing loss of moral principles due to company’s new deals with military and environmental policies.

VigilSAR Benchmark: There Is No Best Model

VigilSAR Benchmark reveals that model rankings vary based on deployment context, emphasizing no single ‘best’ model for defense and intelligence use cases.

Glasspane: One Dataset, Three Views

Glasspane launches a demo showcasing a single dataset with role-specific views, emphasizing transparency and trust in infrastructure monitoring.

Vocal-strain load tracking for working singers

A new app prototype aims to monitor vocal strain in professional singers, providing early alerts to prevent injury during touring schedules.