Anthropic just dropped its core AI safety promise, and that should worry you

1 month ago 23 Back

Published Feb 25, 2026, 12:50 PM EST

Mahnoor is a News Writer at XDA who has been in the professional writing game since her sophomore year of high school. While pursuing a bachelor’s degree in Computer Science, she also has bylines esteemed publications like XDA's sister site, MakeUseOfSlashGear, Laptop Mag, and Android Police.

Whether she's spending hours debugging code or staying up all night to watch a tech event, Mahnoor’s passion for technology is undeniable. She loves writing about all things tech, with a particular focus on iOS and macOS.

When Anthropic introduced Claude in March 2023, the key differentiator was trust and a safety-first approach no other AI lab had taken. In the announcement blog post, the company described Claude as "a next-generation AI assistant based on Anthropic's research into training helpful, honest, and harmless AI systems." In fact, the name Anthropic itself, derived from the Greek word for "human," was a statement of intent.

Rather than positioning itself as just another AI company racing to ship the most powerful model, Anthropic was meant to be the one that put guardrails first. That promise was formalized later in 2023 with the company's Responsible Scaling Policy, which committed Anthropic to something no competitor would. This week, Anthropic revised that commitment and dropped the very pledge that set the company apart as the "trustworthy" AI lab.

Anthropic no longer promises to halt AI development

Safety goals remain, though

claude cowork website open on a laptop

The Responsible Scaling Policy, or RSP, is a public document that outlines rules Anthropic wrote and published detailing what they will and won't do as their AI models get more powerful. When Anthropic published the first version of it in September 2023, the central rule was straightforward: if Claude's capabilities ever outpaced the company's ability to guarantee safety, Anthropic would completely halt training/deploying new models until it caught up.

Anthropic’s commitment to follow the ASL scheme thus implies that we commit to pause the scaling and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL.

On Tuesday, Anthropic published a rewritten version of the RSP that removed this strict, binding commitment to unconditionally halt AI development if safety measures cannot keep up with model capabilities.

In the new version of the policy, they've introduced a "Frontier Safety Roadmap" that outlines their plans for risk mitigation. This new framework is far more flexible than the original policy and doesn't include a hard trigger to stop development. Instead, it replaces the promise with public transparency, where Anthropic will tell the world what the risks are and what they're doing about them. However, the decision to keep going is ultimately theirs.

These are not hard commitments but rather public goals against which we will openly grade our progress.

Anthropic will only slow down if clearly ahead

Safety delays depend on competition and risk evidence now

Anthropic argues that the overall risk from AI depends on multiple developers, and if one responsible developer pauses while others continue without strong mitigation, it could result in a less safe world where developers with the weakest protections set the pace.

ChatGPT on a Mac

Interestingly, while the company has dropped its unconditional pledge to pause, it hasn't abandoned the idea of delaying deployment entirely. Under the new policy, Anthropic claims it will delay AI development to ensure safety if it has a "significant lead" over competitors or if there is strong evidence that all competitors developing highly capable models have strong safety measures.

However, if competitors are advancing with weaker safeguards, Anthropic indicates that it will try to meet those performance standards but "will not necessarily delay AI development and deployment in this scenario." In other words, Anthropic will only consider slowing down if it's clearly ahead of the competition and there's strong evidence of danger. If it's not in the lead, it keeps going.

The timing of all this is hard to ignore

On the same Tuesday that Anthropic published its rewritten RSP, Defense Secretary Pete Hegseth met with CEO Dario Amodei and delivered an ultimatum: roll back the company's AI safeguards for military use, or face serious consequences.

The latter would end Anthropic's $200 million Pentagon contract and require any company with military contracts to stop using Anthropic's tech entirely. Given that Anthropic was the last major AI lab with a hard safety commitment, there is now no major AI company with a binding promise to stop if things get dangerous.

Read Entire Article