chaitanya.dev / Writing / Notes from the field

Where complexity belongs in an agentic system

The model is the easy part. Notes on tool boundaries, state, and the handover between agent and human — the decisions that actually decide whether your pilot survives production.

Every agentic pilot I’ve reviewed in the last eighteen months has had a beautifully chosen model, a thoughtful prompt, and a quiet disaster hiding somewhere in the boring parts — the tool boundary, the state model, or the moment the agent hands back to a human.

The lesson isn’t new. It’s the same lesson we learned during the SOA years, the microservices years, and the “let’s put a message queue in” years. Sophisticated components do not forgive naïve architecture. If anything, they punish it faster, because their failure modes are quieter and their confidence is higher.

This essay is a short, opinionated tour of where I’ve seen complexity belong in real agentic systems — and where teams keep trying to put it, to their later regret. It’s written for technical leaders evaluating a pilot, not for researchers picking a model.

The expensive assumption

The expensive assumption is that the model is the system. It isn’t. The model is one well-funded component of a system that also contains: a tool layer, a state store, a retry policy, a guardrail layer, an evaluation harness, a human-in-the-loop surface, and — usually — a ticketing system nobody wanted to admit was load-bearing.

If your architecture diagram has the model in the middle and arrows going outward, you’ve already bought the expensive assumption.— Seen in three separate client reviews, 2025

A better starting question: what is the smallest system that would still be useful if the model were replaced tomorrow? The answer tells you what belongs in durable architecture and what belongs in a swappable module. Models are swappable. Your tool contracts, state shape, and audit trail are not.

I’m not being flippant about the model here — the quality of the reasoning engine obviously matters. The point is that the reasoning engine’s quality compounds the system’s design quality. A good model in a badly-designed system produces confident, articulate failures. That’s worse than a stupid failure, not better.

Tool boundaries

Tool boundaries are where most agentic pilots quietly die. Three heuristics I keep returning to:

  • A tool should do one thing, deterministically. If your create_order tool can also refund, cancel, and notify — congratulations, you’ve built a model with a typewriter and no spellcheck.
  • The tool’s cost and effect should be visible in its name and schema. The agent’s plan is only as cautious as the inputs you give it.
  • Idempotency is non-negotiable. Retries happen. Multi-agent loops happen. The world is eventually consistent whether you want it to be or not.
Field note The tool layer is where your security team earns its salary. Treat every tool call as if it will eventually be triggered by a prompt injection from an unfriendly email, because eventually it will.

A useful test: if a curious intern read only the list of tool names and schemas, could they describe the full blast radius of the agent in five minutes? If the answer is no, you don’t have a tool layer — you have a surface area.

State, and where it lives

State is the quietest killer. In research demos, state lives in the context window. In production, state lives in at least four places: the model’s context, a short-term scratchpad, a durable conversation store, and whatever system-of-record the tools are writing against.

The first architectural question worth asking is not “which vector DB?” It’s: when the agent reboots mid-task, what exactly does it remember, and how does that compare to what the business remembers? Any drift between the two is where your incidents are hiding.

A useful split

  • Ephemeral state — scratchpad. Lives in context. Discardable.
  • Session state — the current task. Durable, short TTL, queryable.
  • System state — the world. Owned by your existing services. The agent is a client, not a steward.

Agents are bad stewards. They forget, they hallucinate authority, and they don’t carry the institutional memory your existing services do. The moment you let an agent “own” a piece of durable business state, you’ve coupled your model choice to your data contracts — a coupling you will regret the first time you swap providers.

The handover problem

Every agentic system has a moment where it decides to hand back to a human. In well-designed systems, that moment is explicit, cheap, and commonplace. In badly-designed ones, it’s implicit, expensive, and rare — which means when it finally fires, the human has no context and the audit trail is a mess.

Design the handover the way you’d design a pager incident: the receiving human should land on a page with everything they need and nothing they don’t. If your handover screen shows a raw transcript, you’re punishing the operator for the model’s verbosity.

Good handovers are short, structured, and rehearsed. Your L1 operators should be able to describe the format in a sentence.

Three questions I ask during a handover review:

  • Can the operator make the decision the agent couldn’t, in under sixty seconds, with only what’s on the screen?
  • If the operator disagrees, can they correct the agent’s conclusion and push it back, or is the conversation frozen?
  • If a different operator picks up the same case tomorrow, do they get the same context, or does the agent start from scratch?

If the answer to any of these is “no,” the handover isn’t finished — it’s a dropped baton with a UI around it.

A checklist before you demo

If you’re about to show a pilot to a stakeholder, work through this list the night before. It won’t make the demo more impressive. It will make the production conversation that follows a lot shorter.

  1. Name every tool. Explain in one sentence what it does and what it costs.
  2. Draw the state map. Four layers, with TTLs.
  3. Describe the handover surface. What does the operator see?
  4. Identify the retry policy. What’s safe to repeat, what isn’t.
  5. Write one evaluation you could run tonight that would catch a regression.

None of this is glamorous. All of it is where the project lives or dies. The model is the easy part — you’ll change it three times before the system is in production anyway. What you’re really designing is the frame around it.

If this essay resonates and you’re wrestling with a specific pilot, I’m always happy to trade notes. The contact form is quieter than you’d expect.

End of essay
C

Chaitanya Sunkara

Software architect and consultant — enterprise architecture, microservices and agentic AI. 19+ years in .NET, SQL and the cloud, applied to the questions that actually matter.

Working on something like this? Let's trade notes.

I'm always up for a good conversation about enterprise architecture, agentic AI and the AI-native SDLC. Subscribe for field notes, or write if a specific problem is on your mind.

Subscribe →