·5 min read

Zero Trust for AI Agents: Securing LLMs, Tools, and Identity

When the “user” is an AI agent, zero trust means every prompt, tool call, and data request must be verified, scoped, and logged in real time. This post shows how microsegmentation, just-in-time privilege, continuous identity checks, and tamper-evident audit trails stop agents from becoming an enterprise-wide blast radius.

Zero Trust for AI Agents: Securing LLMs, Tools, and Identity

APT29 Didn’t Need Your Prompt Injection to Be Dangerous

APT29 has spent years proving a boring point: if you can steal identity, you can usually skip the rest of the fight. In multiple campaigns, they leaned on legitimate credentials, MFA abuse, and quiet persistence rather than noisy malware. That same lesson applies to AI agents. If you let an LLM act with standing privilege, the model does not need to be “hacked” in the cinematic sense. It just needs to be trusted in the wrong place.

That is the part most AI security talk still misses. The risk is not that a model hallucinates a bad answer. The risk is that it becomes a privileged workflow engine with access to Slack, Jira, GitHub, Snowflake, Google Drive, and whatever internal API someone exposed because “the agent needed it.” Breaches do not care that the interface was novel. They care that the blast radius was large.

Your Agent Is Not a User, So Stop Treating It Like One

A human user can be challenged, paused, and investigated. An agent can chain actions at machine speed, across systems, with no fatigue and no common sense. That means zero trust has to move from a one-time login decision to continuous authorization for every prompt, tool call, and data fetch. If the model asks for customer records, you should know which record set, why, for how long, and under which policy. “It had access” is not a control. It is a postmortem.

This is where most deployments get lazy. People bolt an LLM onto an existing app, hand it an API key, and call it “copilot.” That is not architecture; that is a breach pre-authorization form. You need microsegmentation for agent tools, not just network zones. The agent that drafts emails should not be able to query production databases because both live behind the same service account. If your internal IAM can’t express that distinction, the problem is not the model. It is your control plane.

JIT Privilege Beats Permanent Access Every Time

Just-in-time privilege is not new, but it becomes non-negotiable when the “user” is software that can be socially engineered by a prompt. Give the agent a short-lived token scoped to one task, one dataset, one destination. Then revoke it. Azure AD, Okta, and AWS IAM all support variations of temporary credentials and scoped roles; use them as if you expect compromise, because you should. Permanent tokens for agents are how you turn a single bad prompt into an enterprise-wide incident.

Here’s the contrarian bit: don’t chase “prompt hardening” as your main defense. Prompt filters help, but they are not a boundary. Anthropic’s 2024 responsible disclosure work showed how models can be manipulated into assisting with dangerous tasks when safety layers fail or are bypassed. The lesson is not “write better prompts.” The lesson is that you need policy enforcement outside the model, because the model is not a trustworthy security boundary. It is the thing being supervised.

Tool Access Needs Real Segmentation, Not a Shared API Key

If your agent can call tools, every tool is a potential exfiltration path. That includes browser automation, code execution, document search, ticketing systems, and anything with write access. The right pattern is per-tool identity with narrow scopes, enforced at the broker layer. A retrieval agent should have read-only access to a bounded index, not a general-purpose connector to SharePoint, Box, and Google Drive because someone wanted convenience. Convenience is how you end up explaining yourself to legal at 2 a.m.

You also need to separate tool authorization from model reasoning. The model can propose an action; a policy engine decides whether it happens. That policy should check user intent, data classification, destination, time window, and anomaly history. If an agent that usually summarizes meetings suddenly requests bulk export from Salesforce at 03:14 UTC, the correct response is not “interesting.” It is deny, log, and alert. Humans get the same treatment when they start behaving like they’re staging for a breach.

Continuous Identity Checks Catch the Weird Stuff

Static identity is a weak assumption when an agent can operate across sessions. You need continuous identity checks that bind the action to the originating user, device posture, session risk, and task context. In practice, that means re-auth for sensitive actions, step-up verification for destructive operations, and token binding that makes stolen credentials less useful outside their original context. If your agent can still act after the user’s session has expired, you’ve already lost the plot.

This is where telemetry matters more than the usual compliance theater. Log the prompt, the tool requested, the policy decision, the exact data objects touched, and the downstream response. Redact secrets, sure, but do not redact the sequence. Incident response without sequence is just archaeology with a dashboard. And yes, you should assume someone will try to tamper with the logs. If the audit trail lives in the same trust domain as the agent, it is decoration.

Tamper-Evident Logs Are the Difference Between Detection and Guessing

You do not need blockchain. You need append-only storage, cryptographic chaining, and a separate security domain for audit retention. CloudTrail in AWS, Azure Monitor, and GCP Audit Logs are useful, but only if you actually protect them from the same identities your agents use. Put high-value agent telemetry into a write-once store or a SIEM pipeline with immutable retention. Then verify those logs independently. If the agent can delete its own tracks, congratulations: you built a self-cleaning crime scene.

Codecov’s 2021 bash uploader compromise is the useful analogy here. One modified script quietly exfiltrated environment variables from thousands of customers because trusted automation was allowed to run with too much reach. AI agents are the same shape of problem, just with better marketing. If the toolchain is compromised, the model will happily become your most efficient insider threat.

The Bottom Line

Treat every agent action as untrusted until a policy engine approves it. Use short-lived, narrowly scoped credentials, and split tool access so one agent cannot wander from summarization into data exfiltration.

Log prompts, tool calls, policy decisions, and outputs in a tamper-evident system outside the agent’s trust domain. If you cannot reconstruct what the agent touched and why, you do not have control. You have a story.

References

  • CISA: https://www.cisa.gov/news-events/cybersecurity-advisories/aa23-347a
  • Anthropic Responsible Disclosure on AI Safety: https://www.anthropic.com/news
  • AWS IAM Temporary Credentials: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html
  • Microsoft Entra Conditional Access: https://learn.microsoft.com/en-us/entra/identity/conditional-access/overview
  • Codecov Security Incident: https://about.codecov.io/security-update/

Related posts

Why AI Security Teams Are Embracing Model Context Protocol Guardrails

As more copilots and agents plug into enterprise tools through MCP, the biggest risk is no longer just prompt injection—it’s which servers, scopes, and data sources the model can reach. Practitioners need to understand how MCP allowlists, server attestation, and per-tool permissions can stop a trusted connector from becoming a hidden exfiltration path.

2026’s Quiet AI Risk: Agentic Tools Breaking Cloud Boundaries

Tenable’s 2026 predictions point to a shift from chat-based AI risk to agentic systems that can touch cloud APIs, identity stores, and remediation workflows. The real question is whether security teams can stop a helpful agent from becoming a high-speed path to unintended access or destructive change.

Model Sandboxing Is Becoming the Default for Safe AI Tool Use

As agents gain access to files, browsers, and APIs, security teams are moving high-risk model actions into sandboxes that can observe tool calls, restrict network reach, and block persistence. The open question is whether sandboxing can keep pace when the model itself is the thing deciding what to execute next.

← All posts