That Reddit blocks the action to the Archive Internet is a tragedy. And also a way of stopping a voracious and implacable
The price to be paid for having ia It is the looting of all the internet content. In Reddit they know it well, and they have just taken an extreme measure against those indiscriminate robberies: it is not that Block access to their contents to the ‘scrapers’ of the companies of AI directly. Now they also block them if these companies try to access with rodeos. The injured party? Internet Archive.
What happened. Reddit, who has always been very proactive when protecting “their” contents (which, by the way, have been generated by users voluntarily and free), has realized something: they were stealing them. But not directly, but through previous versions stored in that gigantic digital hemeroteca library called Internet Archive.
Wayback Machine without access. Wayback Machine is the “Machine in Time” of the Internet Archive, and allows access to old versions of any website. But to avoid more content theft, Reddit has banned this platform to index the vast majority of Reddit content. Only the Reddit.com home page can be indexed.
Reddit’s argument. Tim Rathschmidt, spokesman for Reddit, explained In The Verge That although Internet Archive is a service aimed at the open web, they had discovered “cases in which artificial intelligence companies violate the platform policies, including ours, and extract data from Wayback Machine.” In addition, he pointed out the following:
“Until they are able to defend their site and comply with the policies of the platform (for example, respect the privacy of users, in relation to the elimination of deleted content), we are limiting part of their access to Reddit data to protect Reddit users.”
If you want our contents, pay. That message from the spokesman is reasonable, but as little is incomplete. Especially since Reddit has persecuted that kind of looting of AI companies. He has tried to block those who did it with technical means, and the goal both before and now was the same: that companies pay for their contents.
It is something that has achieved with the agreements that have reached since this type of processes began. The first thing he did was close his APIa disaster for all the Internet. Then he ended up reaching a Google agreement, which pays 60 million dollars a year In order to have access to those contents. And the same ended up with OpenAiwith which he sealed a pact whose economic details have not been unveiled but that gives access to Reddit’s contents to the models that enhance chatgpt.
My content is mine (more or less). Social platforms have been nourishing the content of users for years. Until now the business model focused on advertising, but the arrival of AI has allowed us to have an interesting alternative model: that the companies of AI pay for being able to access those contents.
Users barely win, Reddit and social networks do. Contents affirming that they are his – as Reddit, which in June He sued Anthropic– But that actually created the users of these platforms, who without realizing have become slaves of these social networks: they do not stop producing content that others consume, and do so without charging a euro.
These platforms are intermediaries that provide the necessary infrastructure for this content to be available for free, but there are hardly any consideration for creators. Only a few can make a living on YouTube, Tiktok or Instagram for example. In Reddit exist Some metallic remuneration for the “taxpayers” who create the most for the platform.
Cloudflare and content locks. Content companies are beginning to act in a similar way, and in the last two years we have seen how some editorial groups –Including haste– They reach agreements with AI companies so that they can use their contents.
You Shall Not Pass! However, there are companies that go further. We have as clear example to Cloudflare, which has created a system so that companies that use their services can block the “Crawlers of AI” that try to steal their contents. If you are a cloudflare customer, you can activate that block, thus avoiding the problem or at least putting it Much more difficult to the AI companies that try to train their models with your data. Media and platforms such as The Associated Press, Fortune, Time or Stack Overflow are some of the companies that are already using said system.
Quid Pro quo. This cat and mouse game is especially striking for the entire content creation segment, because IA companies use All shortcuts that can to capture (and steal) That data, Have or not copyright. What Reddit raises is a model in which creators compensate for the AI to take that data. Or more than creators, platforms that serve as their meeting and showcase. Media groups and audiovisual content producers have an interesting opportunity here Before the potential traffic collapse caused by solutions like Google AI Overviews.
In Xataka | The “digital decomposition”: how 38% of the websites that existed in 2013 have disappeared from the Internet
GIPHY App Key not set. Please check settings