Anthropic just accused DeepSeek and other Chinese companies of “distilling” Claude

For months we have talked about the race between the United States and China to dominate artificial intelligence as if it were only a question of who trains the most powerful model or launches the next version first. But the pulse begins to move to another, more delicate area: that of the rules of the game. When one laboratory accuses another of extracting capabilities from its system to accelerate its own development, the discussion goes beyond the technical. That’s exactly what Anthropic just did by denounce “distillation” campaigns against his model Claude.

The complaint. In a text published this Monday, the company claims to have detected “industrial-scale campaigns” aimed at extracting Claude’s capabilities. According to their version, the activities attributed to DeepSeekMoonshot and MiniMax reportedly involved more than 16 million queries, question and answer interactions, and were channeled through approximately 24,000 fraudulent accounts, in violation of their terms of service and regional access restrictions.

The race and the suspicion. The announcement by the firm led by Darío Amodei occurs in a context of growing tension around the progress of Chinese AI. Let us remember that DeepSeek altered the Silicon Valley landscape a year ago with the launch of R1, a competitive model that was presented as Developed at a fraction of the cost of American alternatives. The impact was immediate on the markets and revived the political debate in Washington about the technological advantage over China.

Distilling is not always cheating. Anthropic itself recognizes that distillation is a common technique in the sector. It consists, in simple terms, of training a less capable model using the responses generated by a more powerful one, something that large laboratories use to create smaller, cheaper versions of their own systems. The problem, according to the company, appears when this practice is used to “acquire powerful capabilities from other laboratories in a fraction of the time and at a fraction of the cost” that developing them independently would entail. In that case, distillation would cease to be an internal optimization and would become, always according to Anthropic, a way of taking advantage of the work of others.

Recognizable pattern. The three laboratories would have used fraudulent accounts and proxy services to access Claude on a large scale while trying to avoid detection systems. The company details infrastructures, what it calls “hydra cluster”, extensive networks of accounts that distribute traffic between its API and third-party cloud platforms, so that when one account was blocked, another took its place. Anthropic maintains that what differentiated these activities from normal use was not an isolated query, but rather the massive and coordinated repetition of requests aimed at extracting very specific capabilities from the model.

Three campaigns. Although Anthropic presents the campaigns as part of the same dynamic, it distinguishes relevant nuances. DeepSeek would have focused its more than 150,000 queries on extracting reasoning capabilities and generating safe alternatives to politically sensitive questions. Moonshot, with more than 3.4 million queries, would have been oriented towards the development of agents capable of using tools and manipulating computing environments. MiniMax would concentrate the largest volume, more than 13 million queries, and according to Anthropic’s account, it reacted in a matter of hours to the launch of a new system, redirecting its traffic to try to extract capabilities from its most recent system.

A geopolitical issue. The company states that illicitly distilled models may lose safeguards that seek to prevent state or non-state actors from using AI for purposes such as the development of biological weapons or disinformation campaigns. It also argues that distillation undermines export controls by allowing foreign laboratories to close the gap in other ways, while at the same time recognizing that executing these large-scale extractions requires access to advanced chips, thus reinforcing the logic of restricting their availability while, at the same time, remembering that the risk would grow if these capabilities end up being integrated into military, intelligence or surveillance systems.

Images | Xataka with Nano Banana Pro

In Xataka | Seedance is the greatest brutality we have seen generating video. And it has an uncomfortable message: it has surpassed Sora and Veo without NVIDIA chips

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.