Lethal Trifecta for AI Agents

# Lethal Trifecta for AI Agents The Lethal Trifecta is a security concept identified by [[Simon Willison]] describing three capabilities that, when combined in an AI agent, create severe vulnerability to prompt injection attacks: - **Access to private data**: tools that retrieve sensitive information - **Exposure to untrusted content**: ability for malicious data/content to reach the model - **External communication ability**: capacity to send data outside the system) Since LLMs cannot reliably distinguish between legitimate instructions and malicious ones embedded in content, an attacker can craft input that instructs the agent to exfiltrate private data. [[Simon Willison]] emphasized that "guardrail" products claiming 95% attack prevention are inadequate, and he's right. Even small failure rates enable exploitation due to LLMs' non-deterministic nature. The safest approach is to avoid combining all three capabilities, or implement strict [[Human-in-the-Loop]] approval for sensitive operations. ## References - Simon Willison's article: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta ## Related - [[AI Agents]] - [[Prompt injection]] - [[Human-in-the-Loop]] - [[Agentic Knowledge Management (AKM)]]