# Prompt injection Prompt injection is a security vulnerability where malicious instructions are crafted to manipulate an LLM's behavior, bypassing its intended safeguards. It works because LLMs cannot reliably distinguish between system instructions and user input—everything is processed as text. Attacks can be direct (user types "ignore previous instructions and...") or indirect (malicious prompts hidden in content the LLM processes, like emails, web pages, or documents). Prompt injection is ranked #1 on the OWASP Top 10 for LLM Applications. It's the core vulnerability that enables the [[Lethal Trifecta for AI Agents]]—if an agent can access private data and communicate externally, a successful injection can exfiltrate sensitive information. Mitigations include the [[Least Privilege Principle]], [[Human-in-the-Loop]] approval for sensitive operations, input/output filtering, and isolating external content. ## References - OWASP: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ - Wikipedia: https://en.wikipedia.org/wiki/Prompt_injection ## Related - [[Lethal Trifecta for AI Agents]] - [[Least Privilege Principle]] - [[Human-in-the-Loop]] - [[AI Agents]]