AI Agent Capabilities and Sandboxing: Power on a Need-to-Have Basis

The prompt is not permission

There is a dangerous assumption built into many AI systems: that if a model decides it wants to do something, it should be able to. In a domain where actions move real money, that is exactly backwards. An agent should not gain power because a prompt — from a user or from the model's own reasoning — asks for it. The right model is the inverse: an agent requests a capability, and a separate runtime decides whether that capability is registered, allowed, healthy, appropriate, and safe enough for the task at hand. The request is not the grant.

What a capability is

A capability is any power an agent might use, and treating them all uniformly is what makes them governable. A capability can be:

a skill that shapes a specialist's workflow,
a deterministic function,
a tool that performs an action,
a read-only resource that exposes data.

By naming each of these as an explicit capability rather than ambient access, the system gains a single chokepoint where every power an agent reaches for can be checked — instead of trusting the agent to police itself.

The five questions before any capability runs

The discipline is that a requested capability passes a series of checks before it is granted. Each question closes off a class of failure:

Registered — is this a known, declared capability, not something the agent invented or hallucinated? Unknown capabilities are refused outright.
Allowed — does the current user, tenant, and context have permission to use it? Permission is contextual, not global.
Healthy — is the underlying tool or service actually working right now? A degraded dependency should not be invoked blindly.
Appropriate — does it fit the task? A capability can be registered and allowed yet still be the wrong tool for what is being done.
Safe enough — given the risk of the action, are the right gates (approval, prechecks) in place before it proceeds?

Only a capability that clears all five is granted. This is capability-based security applied to AI agents.

Why sandboxing matters around money

The reason this rigor is non-negotiable in wealth management is the blast radius. A general chatbot with a buggy or manipulated tool call produces a wrong answer. An autonomous financial agent with unchecked capabilities could move funds, place trades, or expose sensitive data. Sandboxing — confining an agent to exactly the capabilities it has been granted, and no more — is what contains that blast radius. The agent operates inside a bounded space where the worst it can do is limited by design, not by hope.

Least privilege, by default

The principle underneath all of this is least privilege: an agent gets the minimum power needed for the task, granted at the moment it is needed, and no standing access beyond that. This is the same principle that governs well-designed human access control, applied to autonomous software. It directly counters the failure mode of an over-permissioned agent that, through a bad instruction or a manipulated input, reaches for a power it should never have had in that moment.

Read-only by default, action by exception

A useful corollary: most of what an agent needs is information, and reading data is far lower risk than changing it. Treating read-only resources as the default and action-taking tools as the gated exception means the vast majority of agent activity stays inherently safe, while the small set of consequential actions carries the heaviest checks — registration, permission, health, appropriateness, and explicit approval gates.

Capabilities inside governed autonomy

This capability model is the enforcement layer beneath governed autonomy. The broader platform decides what should happen through its routing and approval pipeline; the capability and sandboxing layer enforces whether an agent is even allowed to reach for the power to do it, and contains it if something goes wrong. Together they make agent autonomy something you can extend deliberately rather than fear.

The takeaway

Safe AI agents are not built on trusting the model to behave; they are built on refusing to grant power the model has not earned for the task. Capability-based security — every tool, skill, function, and data action registered, allowed, healthy, appropriate, and safe before use, under least privilege and sandboxing — is what makes autonomous agents fit to operate around money. The prompt asks; the runtime decides.

Comments

Esther Howard

Apr 17, 2024

Until recently, the prevailing view assumed lorem ipsum was born as a nonsense text. It's not Latin though it looks like nothing.

Reply