# AI Safety AI safety is the field concerned with ensuring AI systems behave as intended, don't cause harm, and remain under human control. It spans technical research (how to build safe systems), governance (how to regulate them), and practical engineering (how to deploy them responsibly). The field addresses several interconnected problems: - **[[AI Alignment]]**: making AI systems pursue the goals we actually want, not proxy objectives that look similar but diverge in edge cases - **[[AI Hallucination]]**: models confidently producing false information - **[[AI Sycophancy]]**: models telling users what they want to hear rather than what's true - **[[Prompt injection]]**: adversarial inputs that hijack model behavior - **[[Data Poisoning]]**: corrupting training data to compromise model behavior - **[[AI Guardrails]]**: practical constraints that prevent harmful outputs or actions In [[Agentic Engineering]], safety takes on additional urgency because agents can *act* autonomously. A hallucinating chatbot gives bad text. A hallucinating agent executes bad code, deletes files, or sends messages. The [[Agentic loops|agentic loop]] amplifies both capability and risk. Regulatory frameworks like the [[EU AI Act]] are beginning to codify safety requirements into law, particularly for high-risk AI applications. ## References - ## Related - [[AI Alignment]] - [[AI Guardrails]] - [[AI Hallucination]] - [[AI Sycophancy]] - [[Prompt injection]] - [[Data Poisoning]] - [[Responsible AI]] - [[EU AI Act]] - [[Reinforcement Learning From Human Feedback (RLHF)]] - [[Agentic Engineering]] - [[Agentic loops]] - [[AI Limitations]] - [[Generative AI Risks]]