Reddit has become the best human data source. AI is trying to prey them

Everyone wants Reddit data. And they want them why they are data humans. That is the Great value of a platform which has become the crown jewel of AI companies. They want to use that data To train their AI modelsand Reddit is tired of trying without asking for permission … and without paying.

Reddit demands Anthropic. The social network, fed up with this type of behaviorhas registered a lawsuit against Anthropic, the creators of Claude, so consider a contract violation and for participating in “illicit and unfair commercial acts” when using the platform and data of the social media company without authorization. Or what is the same: for stealing the data for your AI.

Blunt criticism. In The demand Reddit’s legal managers begin strong: “Anthropic is an artificial intelligence company of late flourishing that proclaims the white gentleman of the artificial intelligence industry. It’s anything but that.” According to Reddit, Anthropic shows a public face in which he presumes his respect for the law and doing things legitimately, and another private “that ignores any rule that interferes with his attempts to fill his pockets even more.”

Human data treasure. Reddit has become In a valuable source of human information. If someone looks for answers, experiences and opinions in raw, this is the platform that has ended up becoming an absolute reference. In Reddit they know it. His legal manager, Ben Lee, explained in The Verge the following:

“Reddit’s humanity has a unique value in a world flattened by AI. Now more than ever, people seek authentic conversations between humans. Reddit houses almost 20 years of rich and human debates about practically all imaginable topics. These conversations do not occur anywhere else and are fundamental to train linguistic models such as Claude.”

Reddit began to protect himself very soon. Knowing that his “human data” were that great treasure with which to make box, Reddit began to make movements for take advantage of that data very soon. A few months after the launch of ChatgPT, it appeared that He made his APIas shortly before I had done Elon Musk with X/Twitter. He controversial movement It was clearly aimed at protecting the platform from those birds of prey in which IA companies had become. Then the demands would begin.

If you want my data, pay. Reddit’s policy has been clear from the beginning, and there have been companies that have assumed the message. Google was one of the first to reach an agreement with Reddit and paid 60 million dollars to the platform To train your AI models with that data. OpenAi ended up doing the samealthough the amount that was paid to Reddit has never been revealed.

Anthropic Discrepa. An Email from Anthropic to CNBC reveals that “we disagree with Reddit’s complaints and we will defend ourselves vigorously.” Interestingly, Anthropic herself has blocked the access of her Claude model to Windsurf, the newly acquired programming startup by Openai. One of its co -founders He affirmed that “it would be rare for us to sell (the API of) Claude a OpenAi.” It is a reasonable argument – and debatable – but it does not seem to be equally logic in the case of Reddit.

But it already has other pending demands. That statement contrasts with two other demands that Anthropic has received in the last two years. Last August, three authors sued it in a Federal Court in California for having “built a billionaire business stealing hundreds of thousands of copyright“Before, in October 2023, Universal Music also sued her in Tennessee for a” systematic and generalized violation of the copyright of the lyrics of his songs. “The record giant He lost that battleHowever, which meant a disturbing victory for the technological ones.

Internet looting continues. It is another case of that Absolute looting that AI companies are carrying out on the Internet. None of them savealthough of course there are flagrant cases such as perplexity or the recent scandal of Goal downloading books from books with copyright to train their models. If there are data that can be used to improve the quality of these models, companies try to get them, and it is just what happens with Reddit.

The IAS do not want copyright. This whole process is part of a worrying phenomenon: there is still not punishment for all these companies despite being violating copyright. Openai already asked for a white letter to operate In that field, but other companies They joined that unusual proposal to eradicate copyright lawsat least for their AI models. The argument of “fair use” remains its great shield in front of these demands, but the reality is that the months go by, we insist, there is still no consequences for this flagrant robbery of the internet content.

Image | Anthropic | Reddit edited with chatgpt

In Xataka | After 19 years, Reddit is finally a profitable company: he has achieved it with a peculiar strategy

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.