# Constitutional AI
An alignment technique developed by Anthropic where AI systems are trained to follow a set of principles (a "constitution") rather than relying solely on human feedback for every decision. The model critiques and revises its own outputs against these principles, reducing the need for human labeling.
Two phases: supervised (model generates, critiques, and revises responses) and RL (model trained on AI-generated feedback against the constitution). Used to train Claude.
Advantage over pure [[Reinforcement Learning From Human Feedback (RLHF)]]: more scalable, more transparent (principles are explicit), less dependent on individual human preferences.
## References
## Related
- [[AI Alignment]]
- [[AI Safety]]
- [[Reinforcement Learning From Human Feedback (RLHF)]]
- [[Responsible AI]]