in

Anthropic trained his AI with millions of books with copyright. To a judge that has seemed correct (with a great asterisk)

Anthropic has just achieved a very important legal victory in that legal battle that the world of AI maintains with copyright and copyright for years. The sentence, favorable to Anthropic, can sit a great precedent for the rest of the cases in which AI companies have been sued for training their models with works with copyright. But be careful, because it has not been a total victory.

ANTOPIC WIN. In the demand of three authors against Anthropic, the company was accused of downloading millions of books with copyright, in addition to buying some of them to scan and digitize them. The objective: train their AI models. Judge William Alsup has made clear In his sentence that “the use for training was a fair use.” Companies that develop AI models have always shielded in that concept of just use to argue how their models with all kinds of works, including those protected by copyright.

Fair use. This legal criterion maintains that limited use of protected material is allowed without needing permission from the owner of those rights. In the laws of Copyright, one of the ways that judges have to determine if that type of activity is a fair use is to examine whether that use was “transformer.” Or what is the same, if something new has been created from these works. For Alsup “the technology in question is one of the most transformatives that many of us will see in our lives.”

A victory with a great asterisk. Although the judge indicated that this training process was a fair use, he also determined that the authors could lead Anthropic to trial for hacking their works. The company argued that this was justified because it was “at least reasonably necessary to train LLMS.” For Alsup the issue is precisely that although they ended up buying some of them, he built a huge library for which he did not pay:

“Anthropic downloaded more than seven million pirate copies of books, did not pay anything and retained these pirate copies in his library even after deciding that he would not use them to train their AI (at all or never again). The authors argue that Anthropic should have paid for these pirate copies of the library. This sentence coincides with it.”

Thomson-Reuters’ precedent. A few months ago Thomson Reuters won a 2020 demand Against a so -called Ross Intelligence Startup. According to them, the company had reproduced material from its legal research division, called Westlaw. The judge rejected the arguments of the defense and declared that the argument for fair use could not be applied in that case. The sentence against Anthropic is right in the opposite direction and blesses that type of use … while companies buy the works with which they train their models. The company of AI, by the way, had already achieved a small legal victory In a previous case against Universal Music.

Anthropic downloaded piecework books. In the trial it was revealed how the co -founder of Anthropic, Ben Mann, downloaded in winter 2021 data sets such as The so -called Books3 or libgen (Library Genesis) that they are nothing more than gigantic book compilations, many of which are protected by copyright.

Goal is in the same. All companies that develop AI models have been trained with all types of data, including works protected by copyright, and they all face a similar situation. Goal, for example, downloaded 81.7 TB of books with copyright via Bittorrent to train their AI models. That makes the company of Mark Zuckerberg can end up suffering a destination similar to that of Anthropic, which has before him a new very dangerous judicial process for his finances.

A potential fine of billions of dollars. As indicated in Wired, the minimum fine for this type of copyright rape is $ 750 per book. Alsup indicated that the illegally unloaded library of Anthropic consists of at least seven million books, and that means that the company faces a potentially huge fine. At the moment there is no date for that new trial.

The endless battle of AI and copyright. This is the last episode of a soap opera that we will undoubtedly see many more chapters. Companies like Google, OpenAI either Perplexity They have been equally voracious when training their models and have devastated public (and not so public) data on the Internet. Copyright’s rape demands are accumulating, and cases such as Anthropic may sit a predictive disturbing for all of them if they did not buy the books they used to train their models.

Image | Emil Widlund

In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Boadilla del Monte wants to be a pilgrimage place. So he has decided to build a 37 -meter colossal

What do you need and how this method works