Zero Trust for AI Agents: Securing LLMs, Tools, and Identity
When the “user” is an AI agent, zero trust means every prompt, tool call, and data request must be verified, scoped, and logged in real time. This post shows how microsegmentation, just-in-time privilege, continuous identity checks, and tamper-evident audit trails stop agents from becoming an enterprise-wide blast radius.
CVE-2024-3094 Is the Right Mental Model for AI Agents
CVE-2024-3094 made the point brutally clear: a tiny change in a trusted component can sit in the path of everything and still look “normal” until someone notices a weird delay in SSH. AI agents create the same problem in a different costume. They sit between users, data, APIs, and admin tools; if you let them inherit broad trust, they become the shortest route from a harmless prompt to a very expensive mistake.
The mistake most teams make is treating the agent like a chatbot with a few plugins. That framing is already obsolete. A production agent is a decision engine with credentials, network reach, and the power to synthesize actions across systems like Slack, Jira, GitHub, Okta, ServiceNow, and AWS. If you would not hand a junior contractor a shell, a production API key, and a list of crown-jewel databases, do not hand that bundle to an LLM because it can write a polite sentence.
Every Prompt Needs an Identity, Not Just a Session Cookie
The first control is not “prompt filtering.” It is identity binding. The agent must be tied to a real human, service account, or workflow identity at the moment it requests data or action, not merely when it starts a session. Microsoft Entra ID, Okta, and Google Workspace all support conditional access and device posture checks; use them to force the agent’s upstream identity to be explicit, short-lived, and attributable.
That matters because prompt injection is not magic; it is just untrusted input trying to steer a privileged workflow. If an agent can read a ticket, parse an email, and then call an internal API without re-checking who asked, you have built a confused deputy with better grammar. Re-authentication on sensitive actions is not friction; it is the price of not letting one poisoned document become a company-wide instruction set.
Microsegment the Agent Like It Might Be Compromised, Because It Will Be
Zero trust for agents means the model never gets a flat network or a fat token. Give it microsegmented access to only the endpoints it needs for the current task, and nothing else. Illumio, Wiz, and Prisma Cloud all sell versions of this idea for cloud workloads; the same logic applies to agent runtimes, especially when they sit next to internal APIs and data stores that were never designed to face a reasoning system.
A useful rule: if the agent can reach your finance database, your source control, and your identity provider from the same runtime, you have already lost the segmentation argument. Separate retrieval, tool execution, and write-back into distinct services with different credentials and different egress rules. An agent that can summarize a spreadsheet does not need direct network access to the payroll system, no matter how “helpful” the demo looked in the boardroom.
Just-in-Time Privilege Beats Permanent Tool Access
Permanent tool access is how you turn a narrow assistant into an enterprise blast radius. Use just-in-time privilege for any action that changes state: creating tickets, approving purchases, merging code, resetting passwords, or touching production. AWS IAM Identity Center, Azure Privileged Identity Management, and CyberArk all support time-bound elevation patterns; the agent should inherit those, not bypass them.
Here is the contrarian bit: don’t let the agent request broad scopes “for convenience” and promise to behave. That is how OAuth abuse keeps showing up in incident reports. Scope should be task-specific and revocable, with a hard expiry measured in minutes, not “until the session ends.” If the agent needs to open 14 Jira tickets, it does not need standing permission to close incidents, rotate secrets, or approve vendor payments while it is at it.
Log the Tool Call, the Retrieved Data, and the Decision Path
If you cannot reconstruct what the agent saw, what it asked for, and what it changed, your audit trail is theater. Log the prompt, retrieved documents, tool invocations, returned payloads, and the final action in a tamper-evident store. Splunk, Elastic, and Datadog can ingest the events; the important part is preserving the chain of custody, not just the final API call.
This is where most “AI governance” programs get fluffy. A screenshot of a chat transcript is not evidence. You want immutable records with request IDs, user IDs, tool names, object IDs, timestamps, and policy decisions. If an agent approves a refund or deletes a record, you should be able to answer who authorized the action, which data the model saw, and whether any policy engine overruled it. Otherwise you are one post-incident meeting away from inventing confidence.
Put the Policy Engine Outside the Model
Do not ask the model to police itself. That is how you end up with a system that can explain your policy but not follow it. Put authorization, data loss prevention, and high-risk action approval in an external policy layer before the tool call fires. Open Policy Agent, Cedar, and vendor controls in platforms like Microsoft Copilot Studio or Google Vertex AI can enforce decisions outside the prompt stream.
This separation matters because prompt text is not a security boundary. An agent can be tricked, overconfident, or simply wrong in a way that sounds plausible enough to pass casual review. If the policy engine says “no access to customer PII unless the request is tied to an active support case and a named analyst,” then the model does not get to negotiate. It either complies or it gets a denial, which is how security is supposed to work.
Red-Team the Agent for Data Exfiltration, Not Just Hallucinations
Most AI testing still obsesses over hallucinations because they are easy to demo. The real risk is exfiltration through tool abuse: pulling secrets from a retrieval index, leaking data into a ticket, or laundering internal content into an external API call. Test against prompt injection, indirect prompt injection from documents and web pages, and cross-tool contamination, where data from one system gets smuggled into another through the agent’s memory.
Use the same discipline you would for a new SaaS integration. Build adversarial cases that include poisoned PDFs, malicious calendar invites, and fake support threads that try to coerce the agent into revealing credentials or changing access. If your red team has not tried to make the agent send an internal file to a public endpoint, they have not really started.
The Real Blast Radius Is Credential Reuse
The fastest way to make an agent dangerous is to let it reuse human credentials or long-lived API keys. That is exactly how a single compromise spreads from chat to cloud to source control. Bind each tool to a separate service identity, rotate secrets automatically, and make token theft useless by constraining audience, scope, and expiry.
This is not theoretical. The same class of mistakes keeps showing up in cloud incidents: overbroad IAM roles, reusable secrets in CI/CD, and service accounts that can do far more than the workflow that owns them. An AI agent is just a new consumer of those old mistakes. If it can impersonate a person, read everything, and act everywhere, you do not have an assistant. You have a roaming admin session with better autocomplete.
The Bottom Line
Treat every AI agent as an untrusted workload with a human-shaped interface. Bind it to a verified identity, force just-in-time privilege for every state-changing tool call, and keep its network and data access segmented to the minimum task scope.
If you cannot reconstruct the prompt, the retrieved context, and the exact tool chain from an immutable log, the control failed. Start by cutting standing access to production systems, then add external policy enforcement and red-team tests for prompt injection and data exfiltration before you let the agent near anything that can move money, secrets, or customer data.