AI as chatgpt is possible thanks to the indiscriminate use of online content. Cloudflare just said that it is over

The great IAS we use daily like GPT, Gemini, Claude, Perplexity and Company exist and are able to do what they do thanks, in large part, in large part, to the content available on the Internet. Companies such as Openai, Google and Anthropic, to mention some, have tracked (and track in real time) the web in search of content that answers the user’s questions. And they do it, unless there are specific agreementswithout offering consideration to the creators of said content beyond a link. It is a practice that is in question from the birth of this technology. Blog articles, Wikipedia, books, User generated content, even personal data. The trackers, those automated bots, do not leave anything behind and today Cloudflare has said that it is over From today, Cloudflare will block by default Scrapers of AI, something that has more implications of what it might seem. Let’s start at the beginning. Web Crawlers. This technology is not new and, in fact, it is thanks to it that the foundations on which the Internet is based (the web search) exists. Surely it is familiar about “The Google Spider“, that bot that tracks the entire website in search of content to index and offer the user. It is only one of the thousands and thousands that exist and that generate 30% of all traffic worldwide. This technology was capital to shape the Internet we know and the relationship with content generators was symbiotic. The economy of the click was born: the creator generates a content, Google Lo Indexa, the user finds it through Google, Google generates income with the advertising of the search engine, the creator receives free traffic and generates income thanks to advertising, affiliates, etc. With AI, the movie is quite different. Data. The AI ​​models need information to feed, be trained and be able to answer questions. To do this, the big companies that we all know tracked the website, They extracted all the content they could and used it to develop technologies such as Chatgpt. What is the problem? That content could be protected by copyright, which led to the fact that The New York Times sue Openai For this same reason since the companies of AI had to sign agreements with the means to access their content. Image: Solen Feyissa Ias connected. AI was evolving and, as expected, It ended up connecting to the Internet. Not only did he give answers based on finite training data, but could be connected to the network to search for the response in the media, blogs and online pages in real time (or almost in real time). The user no longer had to click on a link. The AI ​​searched, analyzed and generated the answer, making traffic towards the media and blogs. The user no longer accesses the original content, does not click on the links. Instead, it consumes a derived product generated by AI To this technology the Ai Crawlers or what is the same is given life: the trackers ia. They are the digievolution of the bots that shape the Internet we know. Among them are OPENAI GPTBOT, META-EXTERNALAGENT META, CLAUDEBOT OF ANTHROPIC O BYTESPIDER DE BYTEDANCE. With them the symbiotic relationship that we mentioned above begins to deteriorate because the user no longer accesses the original content, does not click. Instead, it consumes a derived product generated by Ia. The biggest example: new previous views generated with AI that appear on Google every time you do any search. Volume of daily requests of the main AI Bots | Image: Cloudflare Put the brake … or not, I’m just a .txt. How to solve this indiscriminate tracking and without consideration? The first proposal was Update the Robots.txt file to indicate to the bots that cannot extract the content of a website. This file and one of the most used resources to administer the activity of the bots, but has a small problem: its compliance is voluntary. IA companies can follow the instructions, or can ignore and extract the content. In addition, it may happen that we touch what we should not and that our website disappears from Google. Every website who wants to be on Google must allow Googlebot, its spider, to indicate to the bots that cannot extract the content of a website. This file is one of the most used resources to administer the activity of the bots, but it has a small problem: its compliance is voluntary. IA companies can follow the instructions, or can ignore and extract the content. Cloudflare is planted. We arrive at the recent announcement made by Cloudflare. The platform (The middle internet depends on) has announced that, from today, the blockade of the AI ​​Crawler will be active by default. To do this, Cloudflare offers direct management of robots.txt to avoid problems such as the aforementioned. The key, of course, is that Cloudflare will be in charge of maintaining the updated blockages according to the IA panorama. This, although it is activated by default, is voluntary and can be completely deactivated in the adjustments. To pay. Cloudflare’s other proposal is Pay per crawl. Since AI will continue to need access to the content of a website, why not give the creator the option to charge for such access? Pay Per Crawl, which is currently in Beta, allows domain owners to define a fixed price at request. If an AI Crawler wants to extract the content of that domain, you will have to pay for it. On paper, this tool has the potential to change the current panorama, but everything will depend on the scope, its adoption and what measures take the tracker operators. Cover image | Solen Feyissa In Xataka | I have asked the AI ​​any bullshit and now I am writing a news about her

We knew that LaLiga IPS blocks have been massive and indiscriminate. What we didn’t know was to what extent

In the last three months There are many users and companies that have complained about indiscriminate blockages of IPS Sorted by LaLiga. Analysis had already appeared that these blockages They affected thousands of websitesbut in reality its dimension is much greater. 2.7 million affected domains. Jaume Pons (@jaumepons), Systems administrator, it has been monitoring how LaLiga IPS blocks have affected various domains and websites. In its last analysis, it highlights how on the last weekend league day it reveals that in total 2,699,517 .com, .net and .org domains have been affected. This expert has also published the listings with all the affected domains. Thousands of domains .es. The parallel analysis of this expert during the weekend revealed how only Saturday afternoon-even from the celebration of Barça-Madrid) on Sunday They affected 15,432 domains .esthat again they were inaccessible to Spanish Internet users. They are Digi data. We have been able to talk to Jaume Pons and explained that his analysis focuses on the blockages that end up being managed by the DIGI operator, which is the one he has hired. In other operators the situation may be different. A massive analysis. This systems administrator explained the analysis process, which begins with A petition to Icann of the DNS zones of the domains that this organism manages (.com, .net,, org, among the main ones). There are about 300 million, but Pons then focuses on those who point to the cloudflare DNS, which makes the list reduce about 16-17 million. Crossing data. From there it is possible to make a tracking by solving those DNS to know what IPS these domains point to. When doing Digi requests it is possible to detect if it appears The typical warning message which shows the temporary block warning or not. And in doing so the domains of the original list that have effectively have been blocked by DIGI during the emissions of the LaLiga matches are confirmed. The complete analysis process is completed in about two hours, and allows us to understand the enormous dimension of the problem. Very visited websites are inaccessible. Pons himself explains that the Builtwith platform creates a list with the million most important domains per traffic every month. Laliga IPS blockades affected this weekend 21,914 domains of those who integrate that list. Among the examples are many electronic commerce stores that may undoubtedly have suffered economic and reputational losses. Among those affected, Steam and X (Twitter). But there have also been platforms like Steam or X (Before Twitter) with which users have had problems. So much In that social network as In Reddit Messages appeared indicating how some games of games and even saved games disappeared from the Steam Deck. Incalculable economic and reputational damage. The damage to users and companies is perpetuated with these blockages. We already talked to several of those affected, and in one of the cases it was indicated how the blockages had caused that of the 70,000 euros per month they entered had come to enter about 40,000. A list of affected collects estimates of losses by these companies, and those who have signed up already claim to have lost More than a quarter of a million euros Because of these blockages. The season ends. The blockages will foreseeably end when LaLiga ends at the end of May, but until then users will continue to suffer those problems of access to hundreds of thousands of websites. The question, of course, is whether these blockages They will continue to be produced When you start next season. Damage to third parties continues. It doesn’t matter that affected web domains have greater or lesser traffic: Damage to third parties existsand we are always talking about legitimate domains and websites. At the moment justice still does not act, and in fact rejected the request for nullity of the sentence that allows LaLiga to continue ordering those blockages. Image | LaLiga In Xataka | LaLiga has found the best way to beat Cloudflare: ally with its competition

How to claim if you are seeing yourself affected by indiscriminate laliga blockages

Let’s explain How to claim if LaLiga blocks affect youthose that the football entity is doing indiscriminately in Spain. It is a practice with which in the days of the party is blocked access to services such as Cloudflare, causing hundreds of perfectly legal websites to stop working. Unfortunately, as a user there is not much that you can do against these exaggerated and unfair measures imposed by LaLiga. We have told you some ways to avoid blocking, but little else you can do. However, Yes, you can report it to the European Commissionsince the blockade violates basic principles of user rights on the Internet. These indiscriminate blockages to services that have nothing to do with emissions that pursue violate rights such as network neutrality, equitable access to digital services and freedom of information and company. And you can report that with the media that Europe gives you to do it. This complaint will not solve anything, but If enough people have complaints basedthe European Commission will open an investigation and ask for explanations to the Government of Spain. And to report it you don’t need a lawyer, just fill out a questionnaire. How to report it to the European Commission The first thing you have to do is enter the website of complaints of the European Commissionin which you can send a claim to the entity. To do this, go to EC.Europa.eu/law/application-eu-law/report-braach/es/check-your-criteriaand click on the button Submit a complaint. When you choose Submit a complaintyou will show you a form with several options that you have to choose. In it, you have to mark that the complaint is presented in relation to a national EU authority, and that it is an incorrect application of EU’s right. In the end, you will have to choose the option of Communicate alleged infractions of competition standardsclicking In the link that offers you. This will take you to a page where you are told how to proceed. First you have to go to this page, where below there is a fairly long document that you can see in Spanish HERE. On this page, go down and Look where it puts Exhibitsince it is where you explain how to proceed and all the information you must present To be able to make the complaint. Then you will have to send an email to the address Comp-Market-information@ec.europa.euindicating everything the annex tells you. You will have to put your personal data, the national authority involved, from the corresponding ministry to the Internet provider. You must also include a detailed description of the factssaying how the massive block of IP addresses affects the legitimate pages you use. You must also cite the regulations that you consider to violate this way of proceeding: Regulation 2022/612, articles on network neutrality and access without internet discrimination. When you finish, attach screenshots, affected IP addresses or communications that you have made with your supplier (if you don’t have them, first talk to your Internet operator to tell you why they block the pages, and then attach it). And when you finish reviewing everything, then send the complaint. In Xataka Basics | Avoid the laliga block to cloudflare: how to continue sailing if the block is repeated

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.