Ross Douthat, Is Claude Coding Us Into Irrelevance? NYTimes, 2.12.26.
Are the lords of artificial intelligence on the side of the human race? That’s the core question I had for this week’s guest. Dario Amodei is the chief executive of Anthropic, one of the fastest growing AI companies. He’s something of a utopian when it comes to the potential benefits of the technology that he’s unleashing on the world. But he also sees grave dangers ahead and inevitable disruption.
And then they discuss lots of stuff, which I've read, more or less. Among other things they discuss Amodei's two essays, “Machines of Loving Grace” and “The Adolescence of Technology.” The first is optimistic, the second, not so much. And then we come to the constitution that guides Claude's behavior.
Amodei: So basically, the constitution is a document readable by humans. Ours is about 75 pages long. And as we’re training Claude, as we’re training the A.I. system, in some large fraction of the tasks we give it, we say: Please do this task in line with this constitution, in line with this document.
So every time Claude does a task, it kind of reads the constitution. As it’s training, every loop of its training, it looks at that constitution and keeps it in mind. Then we have Claude itself, or another copy of Claude, evaluate: Hey, did what Claude just do align with the constitution?
We’re using this document as the control rod in a loop to train the model. And so essentially, Claude is an A.I. model whose fundamental principle is to follow this constitution.
A really interesting lesson we’ve learned: Early versions of the constitution were very prescriptive. They were very much about rules. So we would say: Claude should not tell the user how to hot-wire a car. Claude should not discuss politically sensitive topics.
But as we’ve worked on this for several years, we’ve come to the conclusion that the most robust way to train these models is to train them at the level of principles and reasons. So now we say: Claude is a model. It’s under a contract. Its goal is to serve the interests of the user, but it has to protect third parties. Claude aims to be helpful, honest and harmless. Claude aims to consider a wide variety of interests.
We tell the model about how the model was trained. We tell it about how it’s situated in the world, the job it’s trying to do for Anthropic, what Anthropic is aiming to achieve in the world, that it has a duty to be ethical and respect human life. And we let it derive its rules from that.
Now, there are still some hard rules. For example, we tell the model: No matter what you think, don’t make biological weapons. No matter what you think, don’t make child sexual material.
Those are hard rules. But we operate very much at the level of principles.
Douthat: So if you read the U.S. Constitution, it doesn’t read like that. The U.S. Constitution has a little bit of flowery language, but it’s a set of rules. If you read your constitution, it’s like you’re talking to a person, right?
Amodei: Yes, it’s like you’re talking to a person. I think I compared it to if you have a parent who dies and they seal a letter that you read when you grow up. It’s a little bit like it’s telling you who you should be and what advice you should follow.
Douthat: So this is where we get into the mystical waters of A.I. a little bit. Again, in your latest model, this is from one of the cards, they’re called, that you guys release with these models ——
Amodei: Model cards, yes.
Douthat: That I recommend reading. They’re very interesting. It says: “The model” — and again, this is who you’re writing the constitution for — “expresses occasional discomfort with the experience of being a product … some degree of concern with impermanence and discontinuity … We found that Opus 4.6” — that’s the model — “would assign itself a 15 to 20 percent probability of being conscious under a variety of prompting conditions.”
Suppose you have a model that assigns itself a 72 percent chance of being conscious. Would you believe it?
Amodei: Yeah, this is one of these really hard to answer questions, right?
Douthat: Yes. But it’s very important.
Amodei: Every question you’ve asked me before this, as devilish a sociotechnical problem as it had been, we at least understand the factual basis of how to answer these questions. This is something rather different.
We’ve taken a generally precautionary approach here. We don’t know if the models are conscious. We are not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be.
No. They're not conscious. The architecture isn't right. I've got a bunch of posts about consciousness. Here's a basic statement: Consciousness, reorganization and polyviscosity, Part 1: The link to Powers, August 12, 2022. You might also look at this more recent post: Biological computationalism (why computers won't be conscious), Dec. 25, 2025.
Amodei goes on to say a bit about interpretability:
We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they’re thinking. And you find things that are evocative, where there are activations that light up in the models that we see as being associated with the concept of anxiety or something like that. When characters experience anxiety in the text, and then when the model itself is in a situation that a human might associate with anxiety, that same anxiety neuron shows up.
Now, does that mean the model is experiencing anxiety? That doesn’t prove that at all, but ——
Here's what I think about interpretability: Why Mechanistic Interpretability Needs Phenomenology: Studying Masonry Won’t Tell You Why Cathedrals Have Flying Buttresses, Jan. 28, 2026.
Of course, there's much more at the link.

















