AI-native SDLC: embedding AI without breaking the lifecycle

Every CTO and VP of Engineering I’ve spoken to in the last year is asking a version of the same question: how do I get AI into the SDLC without breaking the SDLC? The good news is the question is right. The bad news is that most of the demos making the rounds are answering a different question — how do I make the SDLC look more exciting? — and those are not the same thing.

This essay is an opinionated map of where AI actually earns its keep inside a software development lifecycle, where it merely accelerates work that wasn’t the bottleneck, and where it silently accrues debt your future self will pay with interest.

The temptation to rewrite everything

The temptation is always to treat a capable new technology as a reason to remove existing structure. The first time I saw a team propose deleting their code-review process because “the LLM catches most of this now,” I had to ask what they thought the code-review process was for. Catching bugs was on the list. It wasn’t the only item.

Code review is also where taste gets transmitted, where context gets shared, where an engineer who joined three months ago learns what this codebase considers acceptable. Delete review and you don’t just lose a bug net — you lose the mechanism by which a team stays coherent. The AI is not, yet, a substitute for that. Used well, it’s an amplifier.

The SDLC is not a conveyor belt. It’s the mechanism by which a group of people stay a team. Anything that shortens a step while weakening the team has moved cost, not removed it.

Hold that sentence in mind through the rest of this piece. It’s the difference between an AI-native SDLC and an AI-accelerated one.

What the SDLC is actually for

Before deciding where AI belongs, it helps to enumerate what a good SDLC does. I’d argue five things:

It converts intent (a product idea, a defect report, a compliance ask) into a durable change.
It distributes risk — across reviewers, testers, staged rollouts — so no single human is the failure point.
It preserves institutional memory — who decided what, when, and why.
It produces a compounding improvement in the people who run it.
It produces an auditable, reproducible trail for regulators and for your future self.

Every AI intervention should be evaluated against these five, not just the first. An AI tool that halves PR cycle time but kills the learning loop is a trade, not a win. State the trade and you can make the decision. Hide the trade and you can only discover it later.

Where AI compounds vs. where it merely accelerates

I break AI’s impact on the SDLC into three zones. This taxonomy has held up across several client engagements.

1. Compounding leverage

AI compounds leverage when the output gets better the more you use it, and when the human reviewing it becomes sharper, not lazier.

Test generation guided by specification. A well-structured prompt over an API contract or an acceptance criterion generates test cases the human would have written — sometimes better, sometimes worse, often faster. The review of those tests teaches the engineer the shape of the spec. This compounds.
Commit message and PR summary drafting. When the model generates a first draft from the diff, and the engineer edits it to reflect intent, the team reads clearer history. The history becomes searchable, auditable, and a better input to the next round of work.
Knowledge retrieval over your codebase and docs. The model becomes the on-call colleague who read every runbook. Reviewed answers get cited; cited answers build the habit of writing runbooks that can be cited. This compounds too.

2. Linear acceleration

Linear acceleration means faster, but not better, and not structurally changed. Useful. Not transformative.

Boilerplate scaffolding. CRUD, DTOs, wiring. Faster. The second-order effects are neutral.
Refactor-in-place edits. Rename, reorganise, tidy. Saves minutes. Doesn’t change the shape of the work.
Documentation of existing code. Helpful, but don’t confuse this with designing documentation.

Linear wins are still wins. Just don’t put them in the “transformative” column when you report to the board. They’re the easy part.

3. Silent debt

The third zone is where careers go to die. AI interventions that appear to work but introduce debt the team will only notice six months later.

Silent generation of tests that pass but don’t test anything meaningful. Classic tell: every PR adds three tests, and production incidents stay at the same rate. The tests are asserting behaviour that can’t regress because nothing depends on it.
Review comments that are plausible but unanchored in the codebase’s actual conventions. The model hasn’t read your style guide, but it has read the internet. Watch for drift.
Auto-generated infrastructure that’s correct in isolation but wrong in context. Terraform, k8s manifests, policy files. “It deployed” is a low bar.

The only defence against silent debt is eval — specifically, evals that catch absence of change, not just presence of regression. We’ll come back to that.

The three evaluation shapes that matter

If you remember nothing else from this piece, remember this: you cannot roll out AI in the SDLC safely without three evaluation shapes running continuously.

Shape 1: Correctness evals

Does the output meet the spec? These are the ones everyone builds. They’re necessary, they’re not sufficient. Build them anyway.

Shape 2: Delta evals

Is the output improving or degrading over time — across model upgrades, prompt changes, context changes? A correctness eval tells you the test passed today. A delta eval tells you the test used to catch a bug and now doesn’t. This is the one most teams skip.

Shape 3: Process evals

Is the team better at their job because of the tool, or merely faster? This is harder to measure, but it’s the one that actually corresponds to compounding leverage. Proxies I’ve used: are junior engineers’ PRs improving on their own axes? Are code-review comments getting fewer and sharper, or fewer and blander? Is the team writing better issues and ADRs, or is the tool doing it for them in a way that nobody reads?

If you only have correctness evals, you’ll ship a system that passes tests and silently makes the team worse. I’ve seen it. It looks great on the quarterly review.

Rollout, reversibility, and the audit trail

Two rollout rules I stand by:

Rule one: every AI intervention in the SDLC must be reversible in a single config change. You will need to turn something off for a specific repo, a specific team, a specific customer class, under a specific auditor’s eye. If turning it off requires a migration, you’ve built a dependency, not an intervention.

Rule two: every AI-authored artefact must be traceable. Commit messages, test code, doc changes, review comments — if an AI wrote it or materially shaped it, that fact lives in the metadata. This is not for punishment. It’s for the day an auditor asks how much of this compliance document was human-authored? and you need an answer better than “we’re not sure.”

The audit trail also enables delta evals. You can only know whether AI-authored tests regressed in quality if you can distinguish them from human-authored ones.

What to ship first If you're starting from zero and want one intervention that's high-leverage and hard to mess up: PR summary generation with a human edit step, logged to a searchable history. It compounds, it's reversible, the audit trail is natural, and the first month of output tells you more about your team's documentation hygiene than any survey will.

What I’d do on day one

If I were walking into a 200-engineer org tomorrow with a mandate to embed AI in the SDLC, here’s the ninety-day shape:

Week 1: Name the five outcomes the SDLC is currently producing and rank them by business criticality. This is the baseline that every intervention must not regress.
Weeks 2–3: Pick one compounding intervention from Zone 1 above and one linear one from Zone 2. Instrument them for all three eval shapes. Pick nothing from Zone 3.
Weeks 4–6: Roll out to one team that has appetite and psychological safety to report honestly. Resist the urge to generalise.
Weeks 7–10: Read the process-eval output. Adjust. If the team is worse at their job, pause. If neutral, continue. If better, widen.
Weeks 11–13: Roll to a second team with different shape. Look for the interventions that portable and the ones that were actually team-specific.

Nothing in this plan is glamorous. All of it is what differentiates an AI-native SDLC from a deck that uses the words “AI-native SDLC.”

If you’re designing this for your own org and want a sounding-board, I’m always up for a conversation — drop a note via the contact form, or subscribe for more notes like this one.

AI-native SDLC: embedding AI without breaking the lifecycle

The temptation to rewrite everything

What the SDLC is actually for

Where AI compounds vs. where it merely accelerates

1. Compounding leverage

2. Linear acceleration

3. Silent debt

The three evaluation shapes that matter

Shape 1: Correctness evals

Shape 2: Delta evals

Shape 3: Process evals

Rollout, reversibility, and the audit trail

What I’d do on day one

Chaitanya Sunkara

Working on something like this? Let's trade notes.

The temptation to rewrite everything

What the SDLC is actually for

Where AI compounds vs. where it merely accelerates

1. Compounding leverage

2. Linear acceleration

3. Silent debt

The three evaluation shapes that matter

Shape 1: Correctness evals

Shape 2: Delta evals

Shape 3: Process evals

Rollout, reversibility, and the audit trail

What I’d do on day one

Chaitanya Sunkara

Working on something like this? Let's trade notes.

More notes from the field

Where complexity belongs in an agentic system

Tool boundaries: designing the blast radius of an agent

Claude Design: the AI design tool that changed how I build