Anthropic trained his AI with millions of books with copyright. To a judge that has seemed correct (with a great asterisk)

Anthropic has just achieved a very important legal victory in that legal battle that the world of AI maintains with copyright and copyright for years. The sentence, favorable to Anthropic, can sit a great precedent for the rest of the cases in which AI companies have been sued for training their models with works with copyright. But be careful, because it has not been a total victory. ANTOPIC WIN. In the demand of three authors against Anthropic, the company was accused of downloading millions of books with copyright, in addition to buying some of them to scan and digitize them. The objective: train their AI models. Judge William Alsup has made clear In his sentence that “the use for training was a fair use.” Companies that develop AI models have always shielded in that concept of just use to argue how their models with all kinds of works, including those protected by copyright. Fair use. This legal criterion maintains that limited use of protected material is allowed without needing permission from the owner of those rights. In the laws of Copyright, one of the ways that judges have to determine if that type of activity is a fair use is to examine whether that use was “transformer.” Or what is the same, if something new has been created from these works. For Alsup “the technology in question is one of the most transformatives that many of us will see in our lives.” A victory with a great asterisk. Although the judge indicated that this training process was a fair use, he also determined that the authors could lead Anthropic to trial for hacking their works. The company argued that this was justified because it was “at least reasonably necessary to train LLMS.” For Alsup the issue is precisely that although they ended up buying some of them, he built a huge library for which he did not pay: “Anthropic downloaded more than seven million pirate copies of books, did not pay anything and retained these pirate copies in his library even after deciding that he would not use them to train their AI (at all or never again). The authors argue that Anthropic should have paid for these pirate copies of the library. This sentence coincides with it.” Thomson-Reuters’ precedent. A few months ago Thomson Reuters won a 2020 demand Against a so -called Ross Intelligence Startup. According to them, the company had reproduced material from its legal research division, called Westlaw. The judge rejected the arguments of the defense and declared that the argument for fair use could not be applied in that case. The sentence against Anthropic is right in the opposite direction and blesses that type of use … while companies buy the works with which they train their models. The company of AI, by the way, had already achieved a small legal victory In a previous case against Universal Music. Anthropic downloaded piecework books. In the trial it was revealed how the co -founder of Anthropic, Ben Mann, downloaded in winter 2021 data sets such as The so -called Books3 or libgen (Library Genesis) that they are nothing more than gigantic book compilations, many of which are protected by copyright. Goal is in the same. All companies that develop AI models have been trained with all types of data, including works protected by copyright, and they all face a similar situation. Goal, for example, downloaded 81.7 TB of books with copyright via Bittorrent to train their AI models. That makes the company of Mark Zuckerberg can end up suffering a destination similar to that of Anthropic, which has before him a new very dangerous judicial process for his finances. A potential fine of billions of dollars. As indicated in Wired, the minimum fine for this type of copyright rape is $ 750 per book. Alsup indicated that the illegally unloaded library of Anthropic consists of at least seven million books, and that means that the company faces a potentially huge fine. At the moment there is no date for that new trial. The endless battle of AI and copyright. This is the last episode of a soap opera that we will undoubtedly see many more chapters. Companies like Google, OpenAI either Perplexity They have been equally voracious when training their models and have devastated public (and not so public) data on the Internet. Copyright’s rape demands are accumulating, and cases such as Anthropic may sit a predictive disturbing for all of them if they did not buy the books they used to train their models. Image | Emil Widlund In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

The AI ​​industry is only sustainable violating Copyright laws. So you are trying to eradicate them

Last Saturday Jack DORSEY, Twitter co -founder (now X) and Square (now Block), published a message in x With a overwhelming phrase: “Eliminate all intellectual property laws.” Elon Musk would answer shortly after adding to the idea with a “I agree.” The message has unleashed a debate on intellectual property laws, and does so at a particularly unique time. AFFORE Copyright. Jack DORSEY’s proposal is just the last of the movements in that same direction. Some companies and technology personalities in the United States are asking that the country discarding laws related to intellectual property, something that would be fantastic for those who have trained AI models with works protected by intellectual property. Demands everywhere. Comments arrive in fact just at a time when AI companies They do not stop being sued for copyright rape. The origin of these legal cases is always the same: these companies have been accused of training their models with works and contents Protected by copyright. “Fair use”. Goal, which downloaded More than 80 TB of bookssome of them protected by the laws of Copyright and intellectual property, recently participated in a trial for an older demand For this same subject. Your lawyers They assured that the company did not violate the laws of Copyright, and that they had made A “fair use” Of those books to be able to develop their AI model, call. OpenAi already asked for a white letter. The company led by Sam Altman is one of the most affected by these demands. In a proposal published just a month ago Openai requested that the laws of Copyright in the US be eradicated with the objective of “preserving the ability of American models to learn from materials with copyright.” For Altman, the training of AI models should be free of possible demands for copyright rape, and the same now express Dorsy and Musk. And Google also bothers the copyright. Google has also been accused of using content protected by copyright to train its AI models. In A statement last March the company requested “balanced copyright rules” and explicitly appointed “fair use and text mining and data” as exceptions for these laws. Justice is barely pronounced. The truth is that demands on copyright violation by AI models They started arriving Shortly after the launch of Chatgpt, but at the moment there have been few judicial sentences. Those that have, by the way, have been small victories for IA companies. And they continue, and continue. And the situation does not help control this legal collapse in which the world of AI is located. There have been no punishments or consequences for companies, which at most have been protected by reaching agreements with some editorial groups either Content platforms. However, the implications of these violations are clear for artists in all kinds of disciplines and content creators, who see how their works are used without consent –and without compensation– For something they can’t control while the world seems to turn a blind eye. In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

Millions of people are interested again in Chatgpt. The problem is that he has achieved it by violating copyright

Networks had long since They didn’t go so crazy with an artificial intelligence tool. Normally there is a certain bustle when something attracts more attention to the account, but what has happened with the generation of chatgpt images based on GPT-4O It does not make any meaning. The generative AI has achieved something that had not achieved: surprise the user on foot. And he has done so shows one of the greatest criticisms of this technology: the violation of copyright. Content ©. In recent hours, social networks have been filled with memes, images and avatars edited by ChatgPT for look like Studio Ghibli drawings. The images are really spectacular, to César what is from Caesar, but it is not convenient to forget that an AI knows how to generate an image of a horse because, among other things, it has been trained with millions and millions of images of horses. Click on the image to go to Tweet. Otherwise. If an AI like chatgpt-4o is capable of converting or generating an image With the style of a specific author It is because it knows what the concrete author’s style is like. That is, ChatGPT-4O must have been trained with related content, based or generated by the study founded by Hayao Miyazaki. And what about that content? Which is beautiful, emotional and close, but not free or public domain. It is contained in copyright, an issue that has brought to ChatgPT and head OpenAi since its inception. It is no secret. Of course not. Chatgpt was trained with a huge amount of data obtained from the Internet, websites, books, publications in social networks, academic articles, etc. Content that can be freely accessible, but not for that reason. An image that is “on the Internet” is not “on the Internet”, is housed on a server that can belong to a company and can have (and surely) copyright. That you can see and download it for free to your mobile to use the wallpaper does not mean that you can print it and sell it or illustrate the cover of your next novel with it. Click on the image to go to Tweet. “Live artists”. Openai claims to have opted for a “conservative approach” for the images that use the work of other artists and have “added a denial that is activated when a user tries to generate an image with the style of a living artist.” Like Miyazaki, for example. Before the flood of images generated with the style of the Japanese cartoonist, a company spokesman has told Business Insider that Openai will prevent “generations with the style of individual individual artists”, but will allow “broader studies styles.” In other words, Hayao Miyazaki no style, Studio Ghibli style yes. Which has its ironic point, because in the year 2016after seeing a demo of an animation generated by AI, the teacher Miyazaki said “I would never want to incorporate this technology into my work. I firmly believe that it is an insult to life itself.” My neighbor Totoro | Image: Studio Ghibli The style. It should be noted that no one can prevent someone from doing works with the style of Miyazaki or Studio Ghibli. The style is not protected per se. Another story, and is where the quid of the matter is, is to use protected works to train an AI capable of replicating that style. That is the real problem. We could understand it as the fan art: You can make an illustration of Pikachu, print it and put it in your room, no problem. What you can’t do is sell that illustration. OpenAi’s headache. This access and use of copyright content for commercial purposes has earned Openai some other complaintbeing the most important that of New York Times. Getty also denounced Stable Diffusion for having used their images to train models, Anthropic was denounced By a group of authors for having used their books to train Claude and a goal, apparently, downloaded 81.7 TB of books With copyright to train your models. The conclusion is clear and we have addressed it on occasion: The price to be paid for having artificial intelligence is the looting of all the contents on the Internet, beyond that AI companies They support and hide in the Fair Use. With generative artificial intelligence it seems to have assumed that if it is on the Internet it is free, and the reality is that it is not always. All large AI companies have ignored Copyright laws And, for the moment, there is no consequences. The debate, however, is far from finishing and probably this is not the last time it is put on the table. Cover image | @MDURBAR In Xataka | The generative AI has a huge problem with the content without a license to train. Adobe is trying to solve it

To win the AI ​​race, OpenAi wants the US to forget about laws. Specifically, those of Copyright

OpenAi is immersed in one demand seriesand all of them for the same thing: the alleged violations of the copyright that he has committed when training his AI models. Now a unique idea has occurred to get rid of all those problems. No copyright for AI. In one Proposal published by OpenAIThe company suggests the US government to consider a “copyright strategy that promotes the freedom to learn” and that “preserves the ability of American models to learn from materials with copyright.” Or what is the same: that copyright laws are not applied. IA companies have done what they wanted. We have been in which the demands for copyright rape to AI companies have been frequent. Companies that develop these models have shown no shame in this regard, and The funny thing is that there is still no consequences. What OpenAi now asks is that there are definitely not and that those works can work without legal concerns. China steps on our heels. The main argument to recommend something like that is to compete with more guarantees against China. The Asian giant has demonstrated striking advances, and in fact in Openai indicate how “although America maintains its leadership in Ia today, Deepseek It shows that our leadership is not broad and is narrowing. “ Fair use. As usual, the excuse of a “fair use” of the contents with copyright appears. According to the proposal: “If the developers of the People’s Republic of China have unlimited access to US data and companies, they run out of fair use, the AI ​​race will have finished. United States loses, just as democratic AI does. Ultimately, access to more data from the widest possible range of sources will guarantee greater access to more powerful innovations that provide even more knowledge.” “AI Action Plan” in sight. In January Trump revoked The Executive Order on the Biden He had signed In October 2023. Shortly after he issued a new one And he proposed an “Action Plan” that should be ready in 180 days. Openai’s intentions are that this plan includes such concessions. But the relationship with Trump is delicate. It is true that Openai is the great fencer with softbank of the Stargate projectand that is an initiative that Trump has presumed a lot. However, the relationship of Sam Altman’s company with the current president is complex, especially since Openai is In full legal battle With Elon Musk, the main advisor Trump. White letter. In Openai they seek to have white letter to train their models with works protected by copyright. Not only that: they also want to get their tools to help modernize government agencies being approved more quickly. Experts have not been warning that a too premature adoption of these tools could have dangerous consequences, for example in terms of possible leaks and information security. OpenAi commercialized Chatgpt Gov In January precisely with the idea that government employees had access to this type of services. China’s AI models should be revealed. The company’s proposal led by Sam Altman goes further and indicates that the AI ​​models of the People’s Republic of China are prohibited. According to them, models like Depseek They are “financially supported and controlled by the State”, and pose national security risks. Companies such as Microsoft, Perplexity or Amazon stay on their servers from the Deepseek service, but the data stay on US servers, so it seems difficult for the China Government to have access to them. Image | Flikr (Techcrunch) In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

All the great AI have ignored the laws of Copyright. The amazing thing is that there is still no consequences

French publishers are fed up and They have just sued a goal by copyright violation. They are not the first nor will they be the last, but the problem is not that: the problem is that AI companies have used copyright content to train their models, and it is as if nothing happened. Everything remains the same. More years have passed since Getty will report Stable Diffusionwhich he accused of stealing his photos to train his model of image generation. That was the first of a great list of demands for exactly the same, but despite the time that has elapsed, there has been no news about it. It is as if what Stable Diffusion did – like the others – ended up in the background for the courts of justice. I copy? The suspicion about this type of behavior has been constant, and it was already before Chatgpt was launched in November 2022. Months before, in June, Dall-e was accused of based on images with the author of creators that they received nothing in return. Microsoft, Openai and Github were also sued a few weeks before the launch of Chatgpt, but this time because Github Copilot had been trained Without permission with code of various developers who had not given their permission. A Judge of California dismissed practically all claims of the plaintiffs in July 2024. Few sentences punish AI companies. For now, the sentences that have occurred, such as the aforementioned, give the apparent victory to the companies of AI. It happened for example with a lawsuit against Openai, which The company managed to win. Of course, this victory can be expensive in his other great pending demand with The New York Times, which can claim that he suffered a demonstrable damage. Fair use? The New York Times case against Openai It started in January 2025 and is undoubtedly one of the most important in this area. The company led by Sam Altman – who has used all the data that has been able– It shields what They make a “fair use” of the contents to be able to train their models. The funny thing is that on the one hand they say that, and on the other they have been reaching millionaire agreements with platforms like Reddit and means or editorials like the country precisely to license its contents and avoid new demands. Meta is another level. The ends to which companies are reaching quality data with which to train their AI models are extraordinary. Perplexity The barriers jumped From the Internet, but the goal was even more striking: we recently knew they had Used more than 80 TB of books downloaded via Bittorrent To train your model. Many of them with copyright, something that has caused many criticism and recent demand of some French editorial groups. There seem to be punishment. But as we say, that historical theft of intellectual property seems to be assumed: there are no sentences that have punished those violations of the copyright for the moment, and it is as if collectively those violations had been ignored because the AI ​​offers interesting advantages. But we are forgetting how they have obtained them … or so it seems. In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

The companies of AI have been jumping the copyright for years. They have just suffered a disturbing legal defeat

Thomson Reuters He has won The first important case against AI in the United States. This legal victory can end up being an important precedent in an open war that exists between generative companies and human creators and content creative companies. When chatgpt or existed. One of the curiosities of the case is that the demand arrived in 2020, even before the revolution created by Chatgpt and other generative AI models occurred. At that time Thomson Reuters demanded the startup of the so -called Ross Intelligence. According to them, the company had reproduced material from its legal research division, called Westlaw. The judge, inflexible. As they explain In Wiredthe defense arguments did not convince Judge Stephanos Bibas, of the Court of the District of Delaware. In his sentence he indicated that “none of Ross’s possible defenses is sustained. I reject them all.” Fair use, nothing. Normally IA companies are shielded in the doctrine of fair use (“Fair Use”). This legal criterion maintains that limited use of protected material is allowed without needing permission from the owner of those rights. As explained in Wiredel, four factors are analyzed: the reasons for creating the work, its nature (if it is an essay, a poem, a private letter), the amount of material used, and how that use impacts the market value of the original. Be careful for what copies. Thomson Reuters won two of those analyzes, but the fourth was for Judge Bibas the most important, because Ross “wanted to octize with Westlaw developing a substitute for the market.” That is: they were copied to try to compete with them in the same market. A precedent with a problem. Curiously Ross Intelligence closed its doors in 2021, precisely Faced with costs of the dispute. It is precisely the opposite with AI giants, who usually have many more economic resources when defending these types of demands. The legal precedent is undoubtedly relevant, but it may be more difficult to wield it if the litigation costs cannot be supported by the plaintiffs. Care, generative. The appearance of all kinds of generative models has unleashed a wave of demands for copyright violation. One of the most important cases is what The New York Times holds against Openaibut there are others like the one that affects Microsoft by Github Copilotthat of Stable Diffusion and Midjourney or the recent one Meta scandal and the books with copyright that he used to train his AI models. Fair use and competition. Precisely this judgment raises an important legal obstacle for AI companies. First, for that argument of the fair use that may now not work. And secondly, due to the fact that when using those works protected by copyright, the impact for the original works can be remarkable. Images | WIRESTOCK | Solen Feyissa In Xataka | Openai has used Copyright content to train its models: now it faces a wave of demands

Meta emails reveal that he downloaded 81.7 TB of books with copyright via Bittorrent to train their AI models

In the legal process Kadrey against goal Mark Zuckerberg’s company is accused of having used works protected by copyright to train their artificial intelligence models. A few weeks ago it was revealed that Zuckerberg had approved to use pirate booksbut now new and powerful evidence of this looting arrive. Revealed emails. He “Appendix a“The case includes several mail email messages from the finish Do that data collection in October 2022. “Download with torrents from a company’s laptop does not seem a good idea”. In April 2023 Nikolay Bashlykov, one of those responsible for carrying out this data collection, joking including emojis and indicated that the company would have to be careful with the IP from which they downloaded the data. Goal knew the risks. In September of that year Bashlykov already stopped using emoticons and warned that using torrents would imply acting as “seeds” so that others also download them, and “that might not be legally legally.” These debates are proof that Meta knew that this type of activity was illegal, according to the authors who have sued the company. Erasing the footprints. In a Internal message Meta Frank Zhang researcher indicated how the company avoided using its servers by downloading this data set to “avoid” “the risk that anyone can draw the seed” and who downloaded that data. 81.7 TB of data. As they point out In Ars TechnicaThe evidence shows that Meta downloaded at least 81.7 the terabytes of data from various libraries offered by those books protected by copyright. In a New document The legal process indicated that at least 35.7 TB had been downloaded from sites such as Z-Library or Libgen (which It ended up closing last summer). Goal wants to dismiss those charges. Goal has presented a motion to dismiss those accusations indicating that there was no evidence that any book was downloaded by finishing employees through Torrent or that they were later distributed by goal. In Xataka we have contacted the company, and we will update this news if we receive comments on the case. Loot on the Internet fire. These data affect the debatable practices that AI companies are using to train their models. We saw it With Googleand of course also with Openai, who used millions of texts to train Chatgpt, and Many of them had copyright. Perplexity was in the spotlight after discovering that He skipped the bullfighter Internet rules to avoid payment walls and feed your AI model. Internet robberies are being normalized. The amazing thing about all this is that the fact that all companies are skipping the norms and violating copyright seems to be normalizing the looting of the Internet. It almost does not give time to scandal and we give it almost as a policy of consummate facts to be able to follow ours. Is this really a “fair use”? All companies are shielded in the concept of “fair use” (“Fair Use”). This concept developed in Anglo -Saxon law allows the limited use of protected material without being necessary to ask for permission to do so. Copyright rapes have not stopped arriving in the world of generative AI, but they seem to be in the background while these giants thrive. In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.