French publishers are fed up and They have just sued a goal by copyright violation. They are not the first nor will they be the last, but the problem is not that: the problem is that AI companies have used copyright content to train their models, and it is as if nothing happened.
Everything remains the same. More years have passed since Getty will report Stable Diffusionwhich he accused of stealing his photos to train his model of image generation. That was the first of a great list of demands for exactly the same, but despite the time that has elapsed, there has been no news about it. It is as if what Stable Diffusion did – like the others – ended up in the background for the courts of justice.
I copy? The suspicion about this type of behavior has been constant, and it was already before Chatgpt was launched in November 2022. Months before, in June, Dall-e was accused of based on images with the author of creators that they received nothing in return. Microsoft, Openai and Github were also sued a few weeks before the launch of Chatgpt, but this time because Github Copilot had been trained Without permission with code of various developers who had not given their permission. A Judge of California dismissed practically all claims of the plaintiffs in July 2024.
Few sentences punish AI companies. For now, the sentences that have occurred, such as the aforementioned, give the apparent victory to the companies of AI. It happened for example with a lawsuit against Openai, which The company managed to win. Of course, this victory can be expensive in his other great pending demand with The New York Times, which can claim that he suffered a demonstrable damage.
Fair use? The New York Times case against Openai It started in January 2025 and is undoubtedly one of the most important in this area. The company led by Sam Altman – who has used all the data that has been able– It shields what They make a “fair use” of the contents to be able to train their models. The funny thing is that on the one hand they say that, and on the other they have been reaching millionaire agreements with platforms like Reddit and means or editorials like the country precisely to license its contents and avoid new demands.
Meta is another level. The ends to which companies are reaching quality data with which to train their AI models are extraordinary. Perplexity The barriers jumped From the Internet, but the goal was even more striking: we recently knew they had Used more than 80 TB of books downloaded via Bittorrent To train your model. Many of them with copyright, something that has caused many criticism and recent demand of some French editorial groups.
There seem to be punishment. But as we say, that historical theft of intellectual property seems to be assumed: there are no sentences that have punished those violations of the copyright for the moment, and it is as if collectively those violations had been ignored because the AI offers interesting advantages. But we are forgetting how they have obtained them … or so it seems.
In Xataka | 5,000 “tokens” of my blog are being used to train an AI. I have not given my permission
GIPHY App Key not set. Please check settings