Anthropic Updates Claude’s ‘Constitution’ in Response to Rising Ethical Concerns
Anthropic is publishing a new constitution for its AI model, Claude—detailing Anthropic’s vision for Claude’s values and behavior along with explaining the context in which Claude operates and the kind of entity the company would like Claude to be.
The constitution is a crucial part of the model training process, and its content directly shapes Claude’s behavior.
This has grown out of training techniques that have been used since 2023, according to Anthropic.
Claude itself also uses the constitution to construct many kinds of synthetic training data, including data that helps it learn and understand the constitution, conversations where the constitution might be relevant, responses that are in line with its values, and rankings of possible responses.
All of these can be used to train future versions of Claude to become the kind of entity the constitution describes.
“AI models including Claude need to understand why we want them to behave in certain ways, and we need to explain this to them rather than merely specify what we want them to do. If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalize—to apply broad principles rather than mechanically following specific rules,” Anthropic said.
A summary of the new constitution includes:
- Broadly safe: not undermining appropriate human mechanisms to oversee AI during the current phase of development;
- Broadly ethical: being honest, acting according to good values, and avoiding actions that are inappropriate, dangerous, or harmful;
- Compliant with Anthropic’s guidelines: acting in accordance with more specific guidelines from Anthropic where relevant;
- Genuinely helpful: benefiting the operators and users they interact with.
In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.
Most of the constitution is focused on giving more detailed explanations and guidance about these priorities.
The main sections are as follows:
- Helpfulness
- Anthropic’s guidelines
- Claude’s ethics
- Being broadly safe
- Claude’s nature
Claude’s constitution is a living document and a continuous work in progress.
While writing the constitution, Anthropic sought feedback from various external experts (as well as asking for input from prior iterations of Claude).
For more information, read the full constitution.