April 13, 2025·6 min read

How AI Is Transforming Security Incident Response

Learn how AI tools like ChatGPT can help security teams respond faster, reduce false positives, and streamline analysis.

Ivanti and Exchange Proved the Old Playbook Is Too Slow

Ivanti Connect Secure CVE-2024-21887 was a clean reminder that “faster triage” is not a slogan, it’s the difference between catching command injection on a VPN appliance and reading about your own breach in someone else’s report. UNC5221 didn’t need a clever social engineering campaign; they had an auth bypass, command execution, and a target set that was already doing the usual “we’ll patch it this weekend” routine. If that sounds familiar, it should. Microsoft Exchange ProxyShell in 2021 followed the same script: SSRF, privilege escalation, RCE, then mass exploitation long after the Black Hat talk was over. Security teams didn’t fail because they lacked alerts. They failed because the queue was too long and the investigation notes were too thin.

That’s where AI actually matters. Not as a magic analyst replacement, and not because someone in a board deck said “Copilot” enough times. It matters because incident response is mostly a document-retrieval and pattern-matching problem under time pressure. You have logs, tickets, EDR telemetry, cloud audit events, firewall denies, and a half-written timeline in a shared doc. LLMs are useful when they can compress that mess into something you can act on before the attacker finishes the job.

Where LLMs Help: Triage, Correlation, and the Boring Work You Hate

The highest-value use of ChatGPT-style tools in incident response is not “analyze the breach” in some abstract sense. It’s reducing the time between signal and decision. Feed it a sanitized slice of Microsoft Defender for Endpoint alerts, Okta sign-in anomalies, and AWS CloudTrail events, and it can help you group events by likely kill-chain stage: initial access, privilege escalation, lateral movement, exfiltration. That grouping is not novel intelligence. It is, however, faster than making a tired human do it at 2 a.m. for the fourth time this quarter.

You also get leverage on false positives. A good LLM can summarize why 200 “suspicious PowerShell” alerts are actually one admin script pushed through Intune, or why a burst of 401s in NGINX access logs is credential stuffing, not a broken app. That distinction matters when you’re deciding whether to isolate a host or just annoy the application team again. Dry humor aside, most incident response pain is not from the one big alert. It’s from the 400 medium ones that all look plausible.

There’s a second use case people underplay: translation. During a live incident, you may need to turn raw evidence into something a CISO, counsel, or a cloud platform owner can consume without a guided tour. LLMs are decent at turning packet captures, EDR process trees, and IAM policy diffs into readable summaries. That doesn’t make them authoritative. It makes them useful enough to keep the meeting from becoming a folklore contest.

The Real Win Is Faster Hypothesis Testing, Not Automated Truth

If you’ve done breach work at scale, you know the job is mostly hypothesis testing. “Is this the same actor as the Exchange intrusion?” “Did the attacker touch the kube-apiserver or just the dashboard?” “Is this token abuse or a compromised workstation?” LLMs help by generating candidate explanations and the exact follow-up questions you should ask of your logs. That is a better fit than expecting them to declare truth.

Take Tesla’s 2018 Kubernetes cryptojacking incident. The interesting part wasn’t just the exposed dashboard with no password; it was how quickly a small misconfiguration turned into GPU-backed mining. An LLM can help you pivot from one exposed service to the likely blast radius: kubeconfig exposure, service account permissions, node metadata access, and whether the cluster had egress controls worth the name. You still need to verify every step. But the model can shorten the path from “we found something weird” to “here’s the likely abuse chain.”

This is also where prompt discipline matters. If you ask an LLM, “What happened here?” you deserve the mushy answer you get. Ask for a timeline, the likely attacker objective, the evidence that supports each step, and the gaps that prevent confidence. Better yet, force it to cite the exact log fields it used. If it can’t point to a concrete artifact, it’s not analysis. It’s autocomplete with a suit on.

Where AI Makes You Worse If You Let It

Here’s the contrarian bit: the biggest risk is not that LLMs hallucinate. It’s that they sound plausible enough to stop you from checking. That’s dangerous in incident response, where a confident wrong answer can send you chasing the wrong host, the wrong account, or the wrong cloud region while the real compromise keeps moving. A model that invents a PowerShell command or misreads an AWS ARN is not a helper. It’s an expensive intern with excellent grammar.

You should also be careful about feeding raw evidence into third-party chat tools. Incident response data often contains credentials, tokens, customer data, and internal hostnames that you do not want sitting in a vendor retention bucket because someone wanted a quick summary. If your workflow cannot keep sensitive artifacts inside your control plane, don’t pretend the convenience is free. It isn’t. Neither is the cleanup after a bad data-handling decision.

And no, “human in the loop” is not a control by itself. It’s a sentence people use when they haven’t defined what the human is supposed to verify. If the analyst is just rubber-stamping the model output, you’ve automated confidence, not detection.

A Practical Workflow That Doesn’t Require Theater

The useful pattern is simple. First, normalize evidence: export logs, strip secrets, and reduce everything to a consistent format. Second, use the model for summarization and clustering, not final judgments. Third, require it to produce a timeline, confidence level, and explicit evidence references. Fourth, verify the output against source logs in Splunk, Sentinel, Elastic, or your SIEM of choice before anyone starts writing the incident report.

This works best when you constrain the model to narrow tasks: summarize an EDR process tree, compare two IAM policy versions, explain why a CloudTrail event is unusual, or draft questions for the next containment call. It works poorly when you ask it to “investigate the breach” and then wander off. The machine is not the adult in the room. That job is still yours.

The Bottom Line

Use AI to compress triage, correlate noisy telemetry, and draft investigation hypotheses faster than a human can do it by hand. Keep the model inside a workflow that forces evidence citation, because plausible nonsense is still nonsense.

If you can’t verify the output against source logs in Splunk, Sentinel, Elastic, or CloudTrail, don’t use it in decision-making. And don’t feed sensitive incident data into a chat tool unless you’ve already answered where that data lives, who can access it, and how long it sticks around.

References

Ivanti advisory on CVE-2024-21887: https://forums.ivanti.com/s/article/Security-Advisory-Ivanti-Connect-Secure-and-Policy-Secure-Multiple-Vulnerabilities?language=en_US
CISA alert on Ivanti exploitation: https://www.cisa.gov/news-events/alerts/2024/01/10/ivanti-connect-secure-and-policy-secure-vulnerabilities
Microsoft guidance on ProxyShell: https://msrc.microsoft.com/blog/2021/08/multiple-security-updates-released-for-exchange-server/
CISA on Microsoft Exchange vulnerabilities: https://www.cisa.gov/news-events/alerts/2021/03/02/microsoft-releases-security-updates-exchange-server
Kubernetes documentation on securing dashboards and RBAC: https://kubernetes.io/docs/concepts/security/overview/

Deepfakes and Shadow AI Are Rewriting Incident Response in 2026

IBM’s 2026 threat outlook points to a new response problem: attackers can now pair convincing voice/video deepfakes with unsanctioned AI tools to mislead analysts, accelerate fraud, and blur attribution. The hardest question may be whether your playbooks can verify identity and intent before the first containment decision.

AI-Driven Ransomware Is Shrinking the Defender Reaction Window in 2026

Foresiet’s March–April incident roundup shows attackers using AI to automate reconnaissance, payload tuning, and extortion timing—turning ransomware from a slow campaign into a near-real-time operation. What changes when malware adapts faster than incident response can triage?

Darktrace 2026: AI-Enabled Credential Abuse Overtakes Exploit-Driven Breaches

Darktrace’s latest threat report says nearly 70% of incidents in the Americas now begin with stolen or misused accounts, not software exploits. As attackers use AI to move faster and adapt in real time, are traditional detection tools becoming too slow to catch the breach?

← All posts