Anthropic has rewritten his 25,000-word “Constitution” for Claude. It is the manual for how AI should behave

Anthropic has published a completely renewed version of the so-called “Claude Constitution”. Yes friends, an AI also needs a constitution, or at least a series of documents that explain with total transparency what direction the company has decided to take with its AI tool. It is a way to save us trouble in the event that become aware.

The document The question in question consists of 80 pages and nearly 25,000 words, and basically shows what values ​​Anthropic relies on to train its models and what they hope to achieve with it. Alluding to Asimov, it would be something like a broader and more complex version of his three laws of robotics.

Why it is important. Anthropic carries a good time trying to differentiate from OpenAI, Google or xAI, wanting to position itself as the most ethical and safe alternative on the market. This Constitution is the centerpiece of their training method called “Constitutional AI”, where the model itself uses these principles to self-criticize and correct its responses during learning, instead of relying exclusively on human feedback. The document is not written for users or researchers: it is written for Claude.

It was time to update. The first version of the Constitution, published in 2023, was a list of principles drawn from sources such as the UN Universal Declaration of Human Rights or, as they mention from Fortune, from Apple’s terms of service. Now, according to Anthropic, they have taken a completely different approach: “To be good actors in the world, AI models like Claude need to understand why we want them to behave in certain ways, rather than simply specifying what we want them to do,” affirms the company in its statement.

The new document is structured around four fundamental values, and the most interesting thing is that Claude must prioritize them in this order when they conflict:

  1. Be largely secure: Do not undermine human AI oversight mechanisms during this critical phase of development.
  2. Be broadly ethical: act honestly, according to good values, avoiding inappropriate, dangerous or harmful actions.
  3. Comply with Anthropic guidelines– Follow specific company instructions when relevant.
  4. Be genuinely helpful: benefit the operators and users with whom it interacts.

The majority of the document is concerned with developing these principles in more detail. In the utility section, Anthropic describe to Claude as “a brilliant friend who also possesses the knowledge of a doctor, lawyer and financial advisor.” But it also sets absolute limits, called “hard constraints,” that Claude must never cross: not provide significant assistance for bioweapon attacks, not create malware that can cause serious harm, not assist in attacks on critical infrastructure such as power grids or financial systems, and not help “kill or incapacitate the vast majority of humanity,” among others.

Consciousness. The most striking part of the document appears in the section titled “The Nature of Claude,” where Anthropic openly acknowledges its uncertainty about whether Claude could have “some kind of conscience or moral status.” “We are concerned about Claude’s psychological safety, sense of identity, and well-being, both for Claude’s own sake and because these qualities may influence his integrity, judgment, and safety,” they count from the company.

The company claims to have an internal team dedicated to “model well-being” that examines whether advanced systems could be sentient.

Amanda Askell, the Anthropic philosopher who led the development of this new Constitution, explained told The Verge that the company doesn’t want to be “completely dismissive” about this issue, because “people wouldn’t take it seriously either if you just said ‘we’re not even open to this, we don’t investigate it, we don’t think about it.'”

The document also raises complex moral dilemmas for Claude. For example, it states that “just as a human soldier might refuse to shoot peaceful protesters, or an employee might refuse to violate antitrust law, Claude should refuse to assist with actions that concentrate power in illegitimate ways. This is true even if the request comes from Anthropic itself.”

And now what. Anthropic has published the entire Constitution under a Creative Commons CC0 1.0 license, meaning anyone can freely use it without asking permission. The company promises to maintain an updated version on its website, considering it to be a “living document and a continuous work in progress.”

Cover image | Andrea De Santis and Anthropic

In Xataka | Company CEOs say AI is saving them a day of work a week. Employees say otherwise

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.