-->

Friends of Enterprise AI World! Register NOW for London's KMWorld Europe 2026.

Anthropic Rolls Back AI Safety Pledge Amid Threats from the U.S. Department of Defense

Article Featured Image

After Anthropic declared itself the “Good AI Company,” the organization is now walking back its promise to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate.

According to TIME, “for years, its leaders touted that promise—the central pillar of their Responsible Scaling Policy (RSP)—as evidence that they are a responsible company that would withstand market incentives to rush to develop a potentially dangerous technology.”

Now, instead of self-imposed guardrails constraining its development of AI models, Anthropic is adopting a nonbinding safety framework that it says can and will change.

“We felt that it wouldn't actually help anyone for us to stop training AI models,” Anthropic’s chief science officer Jared Kaplan told TIME in an exclusive interview. “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”

The new version of the policy includes commitments to be more transparent about the safety risks of AI, including making additional disclosures about how Anthropic’s own models fare in safety testing. It commits to matching or surpassing the safety efforts of competitors and it promises to “delay” Anthropic’s AI development if leaders both consider Anthropic to be leader of the AI race and think the risks of catastrophe to be significant. 

Overall, the change to the RSP leaves Anthropic far less constrained by its own safety policies, which previously categorically barred it from training models above a certain level if appropriate safety measures weren’t already in place.

In a statement the company said, “This third revision amplifies what worked about the previous RSP, commits us to more transparency about our plans and our risk considerations, and separates out our recommendations for the industry at large from what we can achieve as an individual company.”

However, experts say the change is concerning. The change to the RSP shows Anthropic “believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities,” said Chris Painter, the director of policy at METR, a nonprofit focused on evaluating AI models for risky behavior. “This is more evidence that society is not prepared for the potential catastrophic risks posed by AI.”

The move also comes amid pressure from Defense Secretary Pete Hegseth, who gave Anthropic CEO Dario Amodei an ultimatum to roll back the company’s AI safeguards or risk losing a $200 million Pentagon contract.

CNN reported that “Anthropic has concerns over two issues that it isn’t willing to drop, according to a source familiar with the company’s meeting with Hegseth: AI-controlled weapons and mass domestic surveillance of American citizens.”

Anthropic believes AI is not reliable enough to operate weapons, and there are no laws or regulations yet that cover how AI could be used in mass surveillance, a source said.

EAIWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues