Automating Security Risk Assessments with AI
Speed up ISO 27001 and SOC 2 risk assessments using AI-powered tools that analyze, score, and suggest treatments for risks.
CVE-2024-3094 sat in XZ Utils for weeks because one maintainer got socially engineered into trusting a backdoored release pipeline, and the only reason it didn’t become a mass compromise was that Andres Freund noticed SSH latency on a Debian box. That’s the part most ISO 27001 and SOC 2 risk assessments politely skip: the risk isn’t always the shiny exploit, it’s the boring control failure that lets the exploit ship.
Why AI Belongs in the Risk Register, Not the Marketing Deck
Most risk assessments still die in spreadsheets. Someone interviews a process owner, copies a few “likely/medium/high” judgments into a matrix, and calls it a day. That’s how you end up with a 40-row register where “unpatched internet-facing VPN” and “lost laptop” get equal airtime, even though the former is the thing that gets you on the front page. AI can help, but only if it is used to triage evidence at scale, not to hallucinate a compliance fairy tale.
The practical win is speed. Tools like ServiceNow GRC, Archer, Drata, and Vanta already centralize policies, assets, and control evidence; AI layers can pull from those systems, map findings to ISO 27001 Annex A controls or SOC 2 Trust Services Criteria, and draft first-pass risk statements in minutes instead of days. That matters when you have 200 SaaS apps, three cloud accounts, and a security team that is already drowning in Jira tickets and vendor questionnaires. It does not matter if the model can write prose but cannot tell you whether public S3 access is actually enabled.
Feeding the Model Evidence, Not Vibes
If you want a useful assessment, feed the system artifacts, not opinions. Pull from AWS Config, Azure Policy, Okta logs, CrowdStrike detections, and vulnerability scanners like Tenable or Qualys, then let the model summarize patterns: stale admin accounts, missing MFA on privileged roles, exposed RDP, or a backlog of critical CVEs with internet-facing exposure. That is a materially better starting point than “management believes the residual risk is acceptable,” which is usually code for “nobody has checked.”
The better implementations do three things well. First, they normalize controls and findings across frameworks so one issue can map to multiple obligations without duplicating the work. Second, they score by business impact, not just CVSS; a CVSS 9.8 on a lab box is not the same as a CVSS 7.5 on a payment processor’s edge node. Third, they preserve the source evidence so an auditor can trace a conclusion back to a cloud config, ticket, or log line. If the AI cannot cite the artifact that drove the score, it is just a very expensive intern.
Scoring Risks the Way Attackers Prioritize Them
Attackers do not rank risk by framework section number, and neither should you. Volt Typhoon didn’t need a zero-day to make life miserable; it lived off the land, abused valid accounts, and targeted edge devices and infrastructure where defenders were slow to notice. That is why AI-assisted scoring should weight exploitability, exposure, identity privilege, and blast radius ahead of generic likelihood labels. A misconfigured IAM role with AdministratorAccess and access to production data is not “medium” because the spreadsheet says so.
This is where AI can be genuinely useful: clustering similar findings across assets and surfacing the ones that compound. For example, a single exposed GitHub token, a permissive cloud role, and a missing secret-scanning control can combine into a clean path to production. Human reviewers often see those as separate tickets. An AI system can flag the chain, which is the thing that actually matters. CrowdStrike, Wiz, and Palo Alto Networks all sell versions of this “attack path” logic because isolated findings are how you miss the breach.
Treatment Plans That Don’t Read Like Corporate Fan Fiction
A decent risk assessment should not end with “accept, avoid, transfer, mitigate” as if those four verbs are magic. AI can draft treatment options that are specific enough to be useful: enforce phishing-resistant MFA on privileged accounts, rotate long-lived API keys, split duties for cloud admins, or add continuous configuration checks for public storage and security groups. For ISO 27001, that kind of specificity maps cleanly to Annex A controls. For SOC 2, it gives auditors something more defensible than a policy PDF and a prayer.
Here’s the contrarian bit: not every risk should be “fixed” immediately, and AI is often better at showing you which ones are noise. Some low-severity findings are expensive to remediate and have little real-world exploitability. Others are technically ugly but operationally contained. If your model can’t distinguish between a cosmetic CIS benchmark miss and a control gap that enables credential theft, it will push you into pointless remediation theater. Security teams already have enough theater.
Where AI Fails Fast: Hallucinated Controls and Bad Normalization
The failure mode is not subtle. Models confidently invent control mappings, overstate coverage, and flatten context. A tool may tell you a cloud logging gap is “low risk” because logs exist somewhere, when the actual issue is that the retention period is seven days and your incident response team needs 30. Or it may map a vendor’s SOC 2 report to your own control environment as if third-party assurance were a substitute for internal evidence. It isn’t. Ask anyone who has had to explain a shared-responsibility gap after the fact.
The fix is boring and effective: require citations, require human approval for score changes above a threshold, and keep a change log for every AI-generated recommendation. If the model suggests lowering a risk because “compensating controls exist,” the reviewer should see the compensating control, the asset, the owner, and the date it was last tested. No evidence, no score change. That rule alone will prevent a lot of compliance cosplay.
The Bottom Line
Use AI to ingest evidence from AWS Config, Okta, CrowdStrike, Tenable, and your GRC platform, then force every risk score to cite the asset, control, and log that justified it. Prioritize attack paths and privileged identities over generic severity labels, and reject any recommendation that cannot survive a human review with source data attached.