·6 min read

LLMs Have Made Phishing Indistinguishable From Legitimate Email

For years, typos and awkward phrasing were the telltale signs of phishing. Large language models just eliminated that detection signal entirely.

Security awareness training has spent two decades teaching people to spot bad grammar, odd phrasing, and spelling errors as signals of phishing email. That heuristic is now obsolete. In 2024, the Hong Kong police and multiple news outlets described a deepfake-enabled fraud case in which an employee at Arup was tricked into transferring roughly HK$200 million after a video call featuring AI-generated likenesses of senior executives. At the same time, vendors including Microsoft and Hoxhunt have documented AI-assisted phishing that is more fluent, more context-aware, and easier to personalize than the mass campaigns defenders trained users to recognize. Large language models now write more polished prose than many humans, and attackers can use them for pennies per thousand tokens.

The Old Detection Model Is Broken

The classic phishing email from a Nigerian prince was easy to spot. So were the second-generation attacks that improved the prose but still made cultural errors — odd formalities, mismatched register, subtle idiom mistakes. Human readers developed intuition for these signals. Security training codified that intuition into checklists.

LLMs bypass all of it. A model given a target's LinkedIn profile, recent company announcements, and a sample of their email correspondence can generate a message that matches the target's organizational context, references real recent events, uses the appropriate internal vocabulary, and is grammatically indistinguishable from a message from a trusted colleague.

This isn't theoretical. IBM X-Force has publicly discussed threat actors using generative AI to improve phishing and business email compromise workflows, and Proofpoint has documented the use of generative AI by threat actors including TA547 to produce more convincing lure content and vary message wording at scale.

Personalization at Industrial Scale

The traditional tradeoff in phishing was between scale and personalization. Mass phishing sent generic messages to millions of targets. Spear phishing sent highly targeted messages to a handful of high-value targets. AI eliminates this tradeoff.

With access to LinkedIn, public social media, company websites, and news archives, an automated pipeline can generate thousands of highly personalized phishing messages per hour. Each message can reference the target's role, their manager, recent company news, and current industry events. The incremental cost per personalized message approaches zero.

This democratizes a capability that was previously limited to nation-state threat actors. Criminal groups can now conduct spear-phishing campaigns at scale that previously required intelligence-agency resources. Campaigns attributed to groups such as TA547 and Storm-1811 have shown how quickly commodity operators adopt new tooling when it improves conversion rates, and generative AI lowers the skill barrier even further.

Voice and Video Join the Threat Landscape

Text is only the beginning. Attackers are combining LLMs with voice cloning and video synthesis to create multi-channel social engineering attacks.

In one well-documented category of attacks, a finance employee receives a video call appearing to show their CFO and other executives requesting an urgent wire transfer. The "executives" are AI-synthesized. The call is fake. The money transfer is real. The 2024 Arup case is the clearest public example: scammers reportedly used deepfake video conferencing to impersonate senior leadership and convince an employee to move funds.

The technical barrier for this category of attack is falling rapidly. Voice cloning tools such as ElevenLabs have demonstrated how little source audio is needed to produce convincing synthetic speech, and public audio from earnings calls, conference talks, webinars, and LinkedIn videos gives attackers abundant training material. The combination of realistic text, voice, and video removes virtually every signal that security training teaches employees to look for.

What Detection Looks Like Now

If content quality no longer signals phishing, detection must shift to other signals.

Behavioral and contextual signals. Does this request match the sender's normal patterns? Does it create artificial urgency? Does it ask for action outside normal channels? These meta-signals survive the transition to AI-generated content. Microsoft has repeatedly emphasized that modern business email compromise often succeeds without malware at all, relying instead on anomalous behavior, impersonation, and social engineering.

Technical authentication. DMARC, DKIM, and SPF still work. An email that passes content quality filters but fails domain authentication should be treated with extreme suspicion. Organizations that haven't fully deployed DMARC enforcement should treat that as an urgent security priority. Google and Yahoo's 2024 bulk sender requirements pushed stronger authentication into the mainstream, but many organizations still have not moved to a reject policy.

Out-of-band verification. For any high-stakes request — wire transfers, credential changes, sensitive data access — establish a protocol for verification through a separate, pre-established channel. Call the person back on a known-good number. Don't reply to the same email thread.

AI-assisted detection. Ironically, the same LLMs that power better phishing can power better detection. Email security vendors such as Microsoft, Proofpoint, and Abnormal Security are deploying models that analyze the full context of a message — not just content quality but sender behavior, timing, relationship history, and organizational context — to flag suspicious messages that content filters would pass.

What Organizations Should Do

Retire training that focuses on spotting typos. It creates false confidence. Update security awareness programs to focus on procedural controls — what to do when something feels off, regardless of how polished the message looks.

Implement and enforce DMARC at all domains. This is table stakes. If you're not doing it, an attacker can send email from your domain.

Establish explicit high-stakes action protocols. Wire transfers over a threshold require a phone call. Credential changes require manager approval. Sensitive data requests require a ticket. Document these and enforce them consistently.

Assume voice and video can be faked. For any unexpected video call requesting sensitive action, verify through an out-of-band channel before complying.

The Bottom Line

The democratization of AI has benefited defenders as well as attackers, but in the domain of social engineering, attackers got the bigger upgrade. The human perceptual filters that security training relied on no longer work. The organizations that survive this transition will be the ones that shifted from "teach people to spot phishing" to "build processes that don't depend on people spotting phishing."

Key Takeaways

  • Stop teaching employees that grammar and spelling are reliable phishing indicators; train them to verify unusual requests, especially those involving money movement, credentials, or sensitive data.
  • Enforce SPF, DKIM, and DMARC across every owned domain, and move DMARC to quarantine or reject rather than leaving it at monitoring-only.
  • Require out-of-band verification for high-risk actions using known-good contact details, not the phone number, email thread, or video call supplied in the request.
  • Update incident response playbooks to include AI-enabled impersonation scenarios, including deepfake voice/video fraud and highly personalized business email compromise.

References

← All posts