The most important architecture decision in an agentic system isn’t which model, which framework, or which vector database. It’s what the tools can do. Everything downstream — reliability, security, auditability, the ability to swap providers in a year without rewriting the stack — flows from the shape of the tool layer.
This is a practical essay, not a philosophical one. I want to leave you with a small set of rules, a worked example, and enough vocabulary that the next time your security team asks “what’s the blast radius of this agent?” you have something concrete to point at.
What a tool actually is
A tool, in the agentic sense, is a named function the model can invoke. That framing is so simple it’s almost unhelpful. Let me offer a more useful one:
A tool is a commitment you make to your future self that the model can read. Everything inside the tool is under your control; everything outside is under the model’s.
The power of that framing is that it forces you to treat the tool boundary as the security perimeter, the audit perimeter, and the contract perimeter — all at once. Cross that boundary and the model is in charge. So think hard about what you let cross.
One verb, one effect
The first rule: a tool should do one thing. Not “manage orders.” One thing. The fastest way to wreck a tool layer is to bundle related effects behind a generic verb.
1 | // Don't do this — "process" can do anything, and the model knows it. |
The reason isn’t aesthetic. Each verb the agent sees is a decision the model must make. Ambiguous verbs invite ambiguous plans. If process_order_request can refund or cancel or re-queue, you’ve outsourced your business logic to a probabilistic reasoner, and you will discover this the first time a support ticket comes in asking why a customer got refunded and cancelled.
One verb, one effect. If you feel tempted to reuse a tool for “similar” operations, don’t. Write a new one.
Idempotency is not optional
The second rule: every side-effecting tool must be idempotent. Not aspirationally. Not “usually.” Enforced at the schema level with a required idempotency key.
Agents retry. Orchestrators retry. Multi-agent loops retry. Your network occasionally retries for you without asking. The first time your agent fires issue_refund twice in the same second because a downstream call timed out and the retry fired before the first completed, you will care very much about whether your tool deduplicated on key.
The world is eventually consistent whether you want it to be or not. Build for it.
A useful pattern: the tool’s request schema includes a client-supplied idempotency key. Your tool stores the key with the result for some TTL. A repeated call with the same key returns the cached result. The model doesn’t need to know this — it just gets the same answer twice, which is exactly what you want.
Cost and effect in the schema
The third rule: make cost and effect visible in the tool’s name and schema. The agent’s plan is only as cautious as the inputs you give it.
A tool called send_email is less cautious than send_email_to_customer, which is less cautious than send_email_to_customer_writes_crm_log. Verbose names feel awkward until the day you read a model’s chain-of-thought and watch it reason differently because the name reminded it that a side effect was coming.
In the schema, I like to include:
cost— a compact string describing the resource the tool consumes (writes:billing,reads:catalog,external:api,human-time)reversible— a boolean or enum.true,false, orcompensating. The model can reason over this.requires_confirmation— an enum for tools that should never fire without an explicit “yes” from a planning layer or a human.idempotencyKey—required,recommended,ignored.
Agents given rich metadata make surprisingly careful plans. Agents given {name, args} make the plans you’d expect from something reading a flat menu.
The security team’s new front door
Every tool is a new front door for your security team. Not metaphorically — literally. Every tool that can be invoked by a model that can be influenced by untrusted input is exposed to prompt injection, which means every tool must be designed as if it will eventually be triggered by an unfriendly instruction smuggled through an email, a PDF, a customer support ticket, or a scraped webpage.
If you take one thing from this essay, take this: treat the tool layer the way you’d treat a public HTTP endpoint.
That means:
- Auth scopes per tool, not per agent.
issue_refundrequiresbilling:write;read_customer_profilerequirescustomer:read. The agent passes scoped credentials, not a superuser token. - Rate limits per tool, per agent, per customer. Prompt injection at scale is no different from any other API abuse.
- Structured logging with a correlation ID from invocation → tool call → downstream effect. You will need this the day an auditor asks “show me every action this agent took for this customer.”
- Deny-by-default. An agent should not have any tool the current task doesn’t explicitly require.
A small worked example
Imagine a back-office agent that resolves Tier-1 billing tickets. Here’s what the tool layer might look like, done well.
Inputs (reads)
get_ticket(ticket_id)— the customer’s original message, metadata, categorylookup_customer(ticket_id)— profile, account state, entitlement tierget_recent_charges(customer_id, window_days)— scoped to the window relevant to the ticket
Actions (writes)
issue_refund({ charge_id, amount_cents, reason_code, idempotency_key })— bounded verb, scoped creds, refund cap enforced server-sideadd_ticket_note({ ticket_id, note, visibility })— internal notes defaultpropose_resolution({ ticket_id, resolution_code, summary, confidence })— does not close the ticket. Hands off to a human.
Handovers
escalate_to_human({ ticket_id, reason, context_summary })— structured fields, not a dump of the chat. Renders on the operator’s screen with everything they need and nothing they don’t.
What’s notable is what’s not here: no close_ticket, no send_email_to_customer, no write_to_crm generic. The human closes the ticket. The human approves the email. The CRM writes are pinned behind specific named tools with reason codes.
The agent could, in principle, do more. The architecture deliberately says: not yet. And because every tool has a name, a cost, a scope, and an idempotency contract, widening the agent’s authority later is a deliberate act, not a drift.
A five-question pre-flight
Before you let an agent touch production, walk the tool layer with this list:
- Does every side-effecting tool have a required idempotency key?
- Can I describe the blast radius of this agent in one sentence, using only tool names?
- If a prompt injection fired every tool the agent has access to, what’s the worst outcome? Is that acceptable to the people who’d sign off?
- Is there a handover tool that is always a valid choice for “I’m not sure”?
- Can I turn off any individual tool in production without redeploying the agent?
If any answer is “no,” you don’t have a tool layer — you have a loaded gun pointed at a probabilistic aimer. Fix that first. The model you pick afterwards almost doesn’t matter.
If you’re designing a tool boundary for a real pilot and want a second opinion, the contact form is the quickest way through — happy to trade notes, and subscribing gets you the next essay directly.
Working on something like this? Let's trade notes.
I'm always up for a good conversation about enterprise architecture, agentic AI and the AI-native SDLC. Subscribe for field notes, or write if a specific problem is on your mind.