·6 min read

NIST AI RMF: Govern, Map, Measure, Manage in Practice

NIST’s AI Risk Management Framework is easier to apply when you treat it as four operational questions: who owns the model, what can go wrong, how do you prove it’s behaving, and how do you respond when it doesn’t? For a deployed LLM, "Measure" means more than accuracy—it means tracking jailbreak success rates, hallucination frequency, policy violations, latency, drift, and abuse signals against real production traffic.

Equifax Learned This the Hard Way: AI Needs More Than a Policy

Apache Struts CVE-2017-5638 was not a subtle failure. One OGNL injection, one missed patch, and Equifax lost data on roughly 147 million people. That breach is still useful because it kills a comforting myth: the hard part is not inventing a framework. The hard part is operating it when production traffic, human shortcuts, and bad assumptions all show up at once. NIST’s AI Risk Management Framework is useful for the same reason. Not because it is magical. Because it forces you to answer four questions you can’t hand-wave away: who owns the model, what can go wrong, how do you prove it is behaving, and what do you do when it isn’t.

If you have deployed an LLM, you already know the old security playbook only gets you part of the way. A model can be “available” and still be unsafe, untruthful, or quietly leaking policy. That is not a theoretical problem. In 2023, researchers showed that prompt injection against tool-using systems can exfiltrate hidden instructions and manipulate downstream actions without touching your perimeter. The attack surface is the conversation itself, plus whatever tools you let the model call. Cute.

Govern Means You Need a Real Owner, Not a Steering Committee

NIST’s “Govern” function sounds bureaucratic until you ask who is accountable when the model starts doing something stupid at scale. If the answer is “the AI team,” you do not have governance; you have a forwarding address. You need a named owner for the model, the prompts, the tools, the data sources, and the rollback decision. That sounds excessive until you remember how many breach postmortems start with “we assumed someone else was watching it.”

For LLMs, governance also means deciding what is not allowed to be connected. A model with read access to Jira, Slack, and a customer database is not “more capable.” It is a privilege escalation waiting for a prompt injection. Twilio’s 2022 0ktapus campaign showed how quickly attackers turn a single authentication weakness into broad access across more than 130 companies. The lesson carries over: if your model can reach sensitive systems, an attacker will try to make it do so on their behalf.

Map Means You Draw the Attack Paths, Not the Architecture Diagram

“Map” is where most teams produce a cheerful slide deck and call it security. That is not mapping. Mapping means identifying the actual flows: prompt sources, retrieval sources, tool calls, log destinations, human review paths, and failure modes. If you cannot tell me where user input enters, where it is transformed, and where it can influence action, you are not done. You are decorating.

This is where LLMs differ from conventional software. The prompt is both input and control plane. Retrieval-Augmented Generation adds another control surface because poisoned or stale documents can steer output without changing the model weights. Dependency confusion attacks in 2021 proved that attackers love paths you forgot were paths. Alex Birsan demonstrated that private package names could be hijacked against Apple, Microsoft, and PayPal by abusing trust in package resolution. Same story, different layer: if your model consumes external content, you need to know what trust assumptions sit underneath it.

Measure Means More Than Accuracy, Because Accuracy Is the Easy Lie

If you are only measuring benchmark accuracy, you are measuring the wrong thing. Benchmarks are useful for procurement decks and almost useless for production assurance. NIST’s “Measure” function becomes real when you track jailbreak success rates, hallucination frequency, policy violations, latency, drift, refusal rates, tool-call abuse, and prompt-injection attempts against live traffic. Production is where the model meets adversarial behavior, not the lab.

You should also measure by use case. A customer-support bot, a code assistant, and an internal search agent fail in different ways. A support bot that fabricates refund policy is a business issue. A code assistant that emits vulnerable snippets is a supply-chain issue. An internal agent that can be induced to summarize confidential tickets is an incident. Same model family, different blast radius. If your dashboard does not distinguish those, it is just a prettier way to be surprised later.

A useful contrarian point: do not trust “human-in-the-loop” as a control if the human sees hundreds of model outputs a day. Review fatigue is real, and attackers count on it. If the model can generate enough plausible nonsense, your reviewer becomes a rubber stamp with a badge. That is not a control. That is ceremony.

Manage Means You Need Containment, Rollback, and Abuse Response

“Manage” is the part everyone wants to skip because it implies the model can fail. It can. So design for it. You need rate limits, tool-call allowlists, output filters that are tested against real jailbreaks, and a kill switch that actually works under load. You also need a rollback path for prompts, retrieval sources, and model versions. If your only mitigation is “we will retrain,” you are describing a future meeting, not incident response.

Operationally, management means treating LLM abuse like any other active threat. Log prompt patterns, tool invocations, refusal spikes, and anomalous token usage. Correlate that with identity, source IP, and session history. If a model suddenly starts making more external calls or producing more policy-violating output, that is not “interesting.” It is telemetry. The difference between a nuisance and an incident is usually whether someone noticed the pattern before the attacker did.

NIST AI RMF Works Best When You Stop Pretending AI Is Special

Here is the part that annoys people: the AI RMF is most effective when you apply ordinary security discipline to a new class of failure. Asset inventory. Access control. Logging. Testing. Incident response. None of this is glamorous. All of it is cheaper than explaining to leadership why the chatbot just summarized the wrong customer records into the wrong Slack channel.

If you want a practical starting point, map every deployed model to an owner, a data boundary, and a rollback plan. Then measure it against adversarial traffic, not just happy-path prompts. Finally, define what happens when the model crosses the line, because it will. The only question is whether you find out in testing or in production. Production has a way of making the lesson memorable.

The Bottom Line

Treat NIST AI RMF as an operating model, not a compliance exercise. Assign one owner per deployed model, inventory every tool and data source it can touch, and define rollback before the first user prompt hits production.

Measure LLMs on jailbreak rate, hallucination frequency, policy violations, latency, drift, and tool abuse against live traffic. If you are not logging those signals, you are guessing.

References

  • NIST AI Risk Management Framework 1.0: https://www.nist.gov/itl/ai-risk-management-framework
  • NIST AI RMF Playbook: https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
  • OWASP Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/
  • Apache Struts CVE-2017-5638: https://nvd.nist.gov/vuln/detail/CVE-2017-5638
  • Alex Birsan dependency confusion write-up: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

Related posts

AI Vulnerability Management Needs an Exposure Map, Not Another Scanner

The latest AI security warnings suggest the real problem isn’t finding one more model flaw—it’s tracking how model endpoints, plugins, vectors, and agent permissions compound into a breach path. Security teams that can map and prioritize that exposure may be the only ones ready when the next AI bug becomes an incident.

Prompt Injection Defenses Are Shifting to Context-Aware AI Gateways

Security teams are realizing that static filters fail when attackers hide instructions inside files, emails, and retrieved documents. The emerging approach is to inspect model inputs, tool calls, and retrieved context together so an agent can refuse malicious instructions before they trigger action.

AI Security GRC Is Getting Automated Through Policy-as-Code

Security teams are starting to encode AI-use rules, model approval gates, and logging requirements directly into infrastructure and workflow controls instead of relying on PDF policies. The practical question is whether policy-as-code can keep shadow AI, misconfigured agents, and risky model rollouts from slipping through review.

← All posts