Securing AI Agents with Least-Privilege Tool Access
AI agents are starting to call APIs, query databases, and trigger workflows—often with far more access than they need. Learn how least-privilege design, scoped tokens, and tool sandboxing can stop prompt injection from turning an assistant into an attack path.
Least Privilege Is the Difference Between a Helpful Agent and a Shell with Opinions
When Microsoft disclosed CVE-2023-29360 in Azure DevOps and GitHub later spent months tightening secret-scanning around token exposure, the lesson was not that software agents are “smart” now. It was that anything allowed to call APIs, read databases, and kick off workflows will happily do exactly that if you hand it credentials broad enough to be dangerous. An AI agent with access to Jira, Slack, GitHub, and a production database is not a colleague. It is a compromised service account waiting for a prompt injection to make the introduction.
The current habit is to wire agents into everything and then act surprised when they behave like everything. That includes customer support bots with write access to ticketing systems, coding assistants with repo-wide secrets, and internal copilots that can trigger Terraform, CI jobs, or payment workflows. If the model can see the tool, and the tool can do the thing, then the model can do the thing — including after an attacker stuffs hostile instructions into a PDF, a web page, a ticket, or a Slack thread the agent is dutifully summarizing.
Prompt Injection Works Because Tool Access Is Usually an Afterthought
The security failure is rarely the model. It is the permission model bolted on afterward like a seatbelt on a shopping cart. OpenAI’s GPTs, Anthropic’s tool use, and Microsoft Copilot Studio all make it easy to connect external systems; none of them can save you if the connected account can delete records, approve payments, or exfiltrate entire tables. Prompt injection only needs one successful instruction override. Overbroad tool scope turns that into a breach instead of an embarrassing hallucination.
A useful way to think about agent risk is to compare it with the old OAuth disaster pattern. If a SaaS app asks for read/write when it only needs read, the blast radius is obvious. Agents are worse because they can chain actions: read a support ticket, fetch a customer record, draft a response, and then trigger a refund or password reset. That is not “automation.” That is an attack path with a conversational interface.
Scope Tokens to One Job, One Tenant, One Action Class
Least privilege for agents should be boringly literal. If the agent only needs to summarize GitHub issues, give it a token that can read issues in one repository and nothing else. If it needs to create a Jira ticket, use a separate credential that can create tickets in one project but cannot edit workflows, change permissions, or browse other projects. GitHub fine-grained personal access tokens, AWS STS session policies, and Google Cloud service account scoping all exist for a reason: broad, long-lived credentials are what attackers love to find.
The mistake I still see is teams handing an agent a single “platform” token because it is easier to manage. That is the same logic that gave us domain admins for printer installs. Split credentials by task, tenant, and environment. Short TTLs matter too. A token that dies in 15 minutes is far less useful to an attacker than one that survives until someone remembers to rotate it next quarter.
Sandbox the Tool, Not Just the Model
If the agent can run code, query SQL, or touch files, the execution environment needs the same hostility you would give an untrusted contractor laptop. Put code execution in a container with a read-only root filesystem, no cloud metadata access, no ambient secrets, and outbound network rules that only allow the exact hostnames required. Use a separate service account for the sandbox, not the same identity that can reach production data.
For database access, the cleanest pattern is still a read-only replica or a purpose-built API layer that enforces row-level and column-level controls. Let the agent ask for “customer 1234’s last invoice status,” not SELECT * FROM invoices. If you let a model generate SQL directly against production, you are one malformed prompt away from a very expensive incident report. PostgreSQL row-level security, BigQuery authorized views, and Snowflake masking policies are not sexy, but neither is explaining why an agent dumped PII into a chat transcript.
Log Every Tool Call Like It Will Be Exhibit A
You do not need “AI observability.” You need audit logs that survive contact with Legal and IR. Record the prompt, the tool name, the parameters, the identity used, the data returned, and the downstream action taken. If an agent approves a ticket that later triggers a payment or changes an IAM policy, you should be able to reconstruct the chain without guessing which LLM had a bad day.
This is where a lot of teams get lazy and rely on model output logs alone. That is not enough. The dangerous part is not what the model said; it is what the tool did after the model said it. Falco, AWS CloudTrail, Okta System Log, and GitHub audit logs all help, but only if you correlate them. A prompt injection that causes an agent to enumerate secrets in one system and exfiltrate them through another is a cross-log problem, not a chatbot problem.
The Uncomfortable Part: Human Approval Is Not a Control by Itself
The standard advice says “add a human in the loop.” Fine, but humans rubber-stamp nonsense all the time, especially when the agent drafts the exact action it wants approved. If the approval screen shows “Approve refund for customer request” and hides the fact that the request came from an untrusted email attachment parsed by the agent, you have built a nicer-looking trap.
A better control is policy gating before approval. Require the agent to classify actions by risk: read-only, reversible write, irreversible write, and privileged admin. Only the last two should ever hit a human review queue, and even then the reviewer should see the raw source that triggered the action, the exact tool parameters, and the identity boundary the action crosses. If the agent cannot explain the provenance of the request, the answer is no. That is not paranoia; that is how you avoid letting a malicious prompt become a change request.
Design for Failure, Because Agents Will Eventually Obey the Wrong Thing
You do not need to assume model jailbreaks are magical. You only need to assume attackers will keep putting instructions where agents read them: web pages, email, tickets, docs, chat, and issue trackers. Microsoft, GitHub, and Anthropic have all spent real effort on tool-use restrictions because unrestricted agents are operationally convenient and security-hostile in equal measure. The fix is not “better prompts.” It is narrower credentials, tighter sandboxes, and logs that show exactly which action crossed which boundary.
If your agent can read from one system and write to another, treat that pathway like any other integration with external input. Validate inputs, restrict outputs, and deny by default. The model does not need to be trusted; the permissions need to be constrained enough that trust is irrelevant.
The Bottom Line
Map every agent to a single business task, then split its credentials so read, write, and admin actions are separate tokens with short lifetimes. Put code execution and database access behind sandboxed services with no ambient secrets, no metadata access, and explicit network allowlists.
Then test prompt injection against the actual tool chain, not a demo prompt. If a malicious instruction can cause a write action, secret lookup, or workflow trigger without a policy gate and a human seeing the raw source, the agent is overprivileged and should be treated like any other compromised service account.