April 3, 2026·6 min read

LLMs Have Made Phishing Indistinguishable From Legitimate Email

For years, typos and awkward phrasing were the telltale signs of phishing. Large language models just eliminated that detection signal entirely.

Phishing No Longer Needs Bad Grammar to Work

Verizon’s 2024 Data Breach Investigations Report says the human element was involved in 68% of breaches. That should make you uncomfortable for reasons that have nothing to do with spelling mistakes, because the old “spot the typo” defense died the day attackers got access to decent language models.

For years, phishing detection was partly a literacy test. Broken English, odd capitalization, and clumsy urgency were the tells. That signal is gone. A threat actor with access to GPT-4, Claude, or even a locally hosted model can draft mail that reads like it came from your CFO, your payroll vendor, or the Microsoft 365 admin who “just needs you to re-authenticate.” The only thing AI removed was the attacker’s need to sound like a drunk intern.

The practical change is not that phishing got more “convincing” in some abstract sense. It got cheaper to scale and harder to triage. A single operator can now generate hundreds of variants tailored to department, geography, and recent business events. If you’ve ever seen a BEC campaign that mentioned a real invoice number, a recent acquisition, or the name of your IT help desk vendor, you already know the game: the more context the attacker has, the less grammar matters.

The Old Tells Were Never the Real Defense

A lot of security advice still acts like users are supposed to become amateur linguists. That was always weak. Real phishing defenses were supposed to live in mail authentication, identity controls, and transaction verification. Instead, plenty of teams built a human filter around “look for mistakes,” which is about as durable as a screen door in a breach investigation.

Look at the failure modes that actually matter. In the 2024 Midnight Blizzard campaign, Microsoft disclosed that the actor accessed corporate email after password spraying a legacy test tenant. That wasn’t a typo problem. It was identity sprawl, weak tenant hygiene, and the kind of forgotten infrastructure that survives every cleanup meeting because nobody wants to own it. Phishing thrives in the same environment: old accounts, weak MFA coverage, and users trained to trust the shape of an email more than the path it took to get there.

And yes, attackers still use phishing kits that harvest credentials through Microsoft 365 or Google Workspace login pages. But the email itself is increasingly just the delivery vehicle. The payload is the login page, the OAuth consent prompt, the callback to a fake DocuSign clone, or the help desk ticket that gets your password reset without enough friction. Grammar was never the crown jewel. It was just the cheap part to automate.

LLMs Turned Social Engineering Into a Commodity

The real shift is operational. Before LLMs, writing a polished lure at scale took time, language skill, and some patience. Now it takes prompts and a few minutes of cleanup. That matters because phishing campaigns are not judged by literary quality; they are judged by conversion rate. If an attacker can A/B test subject lines, tone, and urgency across thousands of targets, you are no longer defending against “a phishing email.” You are defending against a small optimization loop.

That loop gets nastier when the attacker combines public data with model output. LinkedIn job titles, Salesforce case references, recent press releases, and your own website’s support contact language all make the lure look normal. The email does not need to be perfect. It just needs to be plausible enough that a busy person clicks before their second coffee. Human attention is still the scarcest control in security. That has not changed, unfortunately.

There is also a less discussed effect: LLMs help attackers maintain tone consistency across the entire chain. The initial email, the follow-up, the chat message, and the voice script can all sound like the same person. That reduces the “something feels off” friction users used to rely on. If your detection logic still depends on awkward phrasing, you are already behind. The machine writes better than the average scammer now. Not a high bar, but here we are.

Why Your Mail Gateway Is Not Enough

Proofpoint, Microsoft Defender for Office 365, and Google Workspace all do useful work, but none of them can save you from a well-crafted message that passes authentication and lands in a trusted thread. DMARC, DKIM, and SPF help with spoofing. They do not stop a compromised mailbox, a legitimate third-party sender, or a reply-chain hijack. Attackers know this. That is why they increasingly prefer real accounts, trusted services, and vendor relationships over obvious spoofing.

Polyfill.io is a good reminder of how trust chains fail in practice. In 2024, the domain’s takeover led to malicious code being served through a widely trusted CDN path, affecting more than 100,000 websites. Different attack, same lesson: when the delivery channel is trusted, the content can be ugly and still get through. Email works the same way. If the sender is real, the domain is aligned, and the message references a live business process, your gateway may happily wave it through while your users do the attacker’s job for them.

This is where the standard advice gets lazy. “Train users to spot phishing” is not a control. It is a hope with a slide deck. Better advice: make credential theft less useful, make payment changes hard to complete by email alone, and make it painful for a single message to trigger a high-risk action. If your approval process can be completed by replying “confirmed,” you are not doing security. You are doing theater with a mailbox.

What Actually Breaks the Chain

You do not beat AI-assisted phishing by trying to outwrite the attacker. You beat it by shrinking the number of actions an email can authorize. Enforce phishing-resistant MFA like FIDO2 or passkeys for high-value accounts, especially admins, finance, and help desk staff. Conditional access should not be a suggestion. Legacy authentication should be dead, not “monitored.”

Then tighten the business process. Payment changes, bank detail updates, and password resets should require an out-of-band verification step that cannot be satisfied by the same inbox the attacker already controls. For Microsoft 365 and Google Workspace, audit OAuth grants and third-party app consent. A lot of “phishing” now ends in a malicious app with mailbox access, which is a very efficient way to make your inbox betray you without ever stealing the password.

Finally, instrument for behavior, not prose. Look for impossible travel, new forwarding rules, unusual OAuth consent, suspicious inbox delegation, and message thread hijacking. The language in the email may be perfect. The mailbox behavior usually is not. That is where your detection should live, because the grammar police are not coming to save you.

The Bottom Line

Stop treating awkward wording as a primary phishing signal. It is a weak heuristic now, and LLMs erased even that. Shift your controls to phishing-resistant MFA, out-of-band verification for sensitive actions, and mailbox behavior monitoring.

If you still rely on user judgment to catch “bad English,” you are defending the wrong layer. Attackers have already moved on to identity, trust chains, and process abuse. You should too.

References

Verizon 2024 Data Breach Investigations Report: https://www.verizon.com/business/resources/reports/dbir/
Microsoft Security Blog — Midnight Blizzard activity: https://www.microsoft.com/en-us/security/blog/
CISA — Phishing-resistant MFA guidance: https://www.cisa.gov/resources-tools/resources/phishing-resistant-mfa
NIST SP 800-63B Digital Identity Guidelines: https://pages.nist.gov/800-63-3/sp800-63b.html
Polyfill.io supply chain reporting: https://www.bleepingcomputer.com/news/security/polyfillio-supply-chain-attack-impacts-over-100-000-websites/

AI Is Now the Attacker: 9 Incidents Reshaping Cyber Defense in 2026

In March and April 2026, AI-enabled attacks became cheaper to launch, faster to scale, and harder to stop, according to IBM X-Force, Akamai, and aggregated threat intel. What happens when the same tools defenders rely on are now driving the most damaging breaches?

Darktrace 2026: AI-Enabled Credential Abuse Overtakes Exploit-Driven Breaches

Darktrace’s latest threat report says nearly 70% of incidents in the Americas now begin with stolen or misused accounts, not software exploits. As attackers use AI to move faster and adapt in real time, are traditional detection tools becoming too slow to catch the breach?

How LLM Watermarking Could Detect AI-Generated Phishing Before It Spreads

Watermarking is becoming a practical control for identifying text, images, and audio produced by generative AI—but attackers are already testing ways around it. The real question is whether defenders can deploy watermark checks fast enough to flag suspicious content before phishing campaigns, deepfakes, and fraud messages go viral.

← All posts