AI solves equations and chops code, but continues to crash with PDFs: the explanation shows its limits

It’s probably happened to you. You upload a PDF to an artificial intelligence chatbot in the hope that it will summarize a report, extract a table or find a specific piece of information for you in a matter of seconds. And, sometimes, he succeeds. But other times, the result is disconcerting: mixed columns, footnotes embedded in the middle of the text, tables converted into an illegible block or answers that do not faithfully reflect what the document says. The paradox is evident. Systems that already demonstrate clear advances in mathematics and programming They keep stumbling upon something as everyday as a PDF. And there is more than a simple punctual failure. Change of mentality. Although for us it is a document with well-defined paragraphs, titles and tables, for the system that processes it the situation may be very different. PDF is, first and foremost, a way to visually describe how a page should be rendered. And when a chatbot like Gemini either ChatGPT If you try to work with it, you do not always access an ordered structure, but rather a set of graphical instructions that you must first reconstruct before you can respond coherently. And that difference is better understood when we look at how a PDF “saves” information. How you actually organize information. Unlike a web page, where the content follows a logical order defined in the code, a PDF can store text as independent fragments placed at specific positions on the page. Many times, the file retains coordinates and placement instructions, but not necessarily explicit relationships between one sentence and the next. This implies that the order in which the text “appears” when extracted does not always coincide with the order in which we read it. If your document includes multiple columns, tables, or overlapping elements, the system must figure out how they fit together. And that deduction is not always trivial. {“videoId”:”x9hhg44″,”autoplay”:false,”title”:”The TRUTH of AI – This is how ChatGPT 4, DALL-E or MIDJOURNEY works 🤖 🧠 ARTIFICIAL INTELLIGENCE”, “tag”:”webedia-prod”, “duration”:”1173″} What happens with HTML. On a web page, the content is organized in an explicit hierarchy– There are tags that indicate what a title is, what a paragraph is, what a table is, and how those elements relate to each other. This structure is part of the file itself and makes it easier for other systems to read, index and process it. In a PDF, as we have seen, that semantic layer may not exist or be clearly defined. Therefore, in practice, extracting information from a website tends to be a more predictable process, while doing it from a PDF is more complicated. So what about OCR? It is the first solution that comes to mind. If the problem is that the text is not well structured or even “drawn” like an image, optical character recognition should convert it into something machine readable. And in part it does. OCR has been used for decades to transform images of words into text, but converting an image to text is not the same as reconstructing the logic of the document. When there are varied elements, the system can recognize each word without knowing exactly how they fit together. The result is not a failure in reading characters, but in the organization of information. In Xataka Dario Amodei founded Anthropic because OpenAI didn’t take the risks of AI seriously. Now you are going to give in to those risks Why don’t we abandon PDF? The answer is more pragmatic than technological. As reported by The Verge citing the person responsible for the PDF Associationthe format became established precisely because it allows a document to look the same today as it would in ten or twenty years, regardless of the device or software with which it is opened. A web page can change depending on the browser, an editable sheet can be modified or overwritten, but a PDF maintains its appearance and visual integrity. That stability is precisely what lawyers, engineers, public administrations and any organization that must maintain reliable records need. The challenge is not to replace the format, but to learn to interpret it better. Images | Xataka with Nano Bana In Xataka | Three AIs clashed in ‘War Games’. 95% of them resorted to nuclear weapons and none ever surrendered (function() { window._JS_MODULES = window._JS_MODULES || {}; var headElement = document.getElementsByTagName(‘head’)(0); if (_JS_MODULES.instagram) { var instagramScript = document.createElement(‘script’); instagramScript.src=”https://platform.instagram.com/en_US/embeds.js”; instagramScript.async = true; instagramScript.defer = true; headElement.appendChild(instagramScript); – The news AI solves equations and chops code, but continues to crash with PDFs: the explanation shows its limits was originally published in Xataka by Javier Marquez .

In 2010, a student from Barcelona was looking for an easy way to edit PDFs. 16 years later, it is one of the most viewed websites on the internet

From a form to a receipt to an invoice: PDF is the quintessential extension for sharing documents, regardless of whether you do it from a Windows computer to an iPhone or an Android tablet. It doesn’t matter: you’re going to see the original format no matter what. But, oh my friend, if you have to get your hands on a PDF. Marco Grossi also found himself in trouble with a PDF. One, who is already in her years, had to make a living to avoid paying for the Adobe Acrobat license (in the past it was not a subscription and the price was not exactly cheap) to edit a PDF for a cent by the wind: from printing and scanning to wasting time reconstructing with a word processor. In that first decade of the 2000s I was a student who struggled with documents and Marco Grossi, too. Back in 2010, this Barcelonan, who has studied Multimedia and Photography and also programming, found himself faced with a task as mundane as having to copy and paste a PDF: it was not an easy task. How does it count himself for La Vanguardia“I’m a programmer, and I’m good at computer issues, so it took me about 15 minutes to figure it out.” And then came iLovePDF. As the founder and CEO confesses for El Paísat that moment he discovered that there was a need: “I realized that it was very simple and that I could create it myself.” It was not the first (the ancient but reliable PDFSam It had an interface that was backwards), but it was the one that managed to establish itself as the software to manage PDF for normal and ordinary users (although also for companies reluctant to pay, because it solves the basics quickly and well). A meteoric rise. What started as a personal project that he combined with freelance web design, in 2014 became his 100% occupation. Until 2017 he worked alone from home, but at that moment he took a step forward: He rented an office and hired an old college classmate. Now there are 43 people. At that time, his website was already receiving between 200,000 and 300,000 daily visits from organic traffic. In 2025 Grossi counted which were around 150 million unique users per month. The portal ahrefs listed it in 2024 in 34th place on a global scale, above Amazon in India and just below Wikipedia in Russia. Screenshot of iLovePDF from 2018. via Archive.today Good, nice and cheap free. Your philosophy From the beginning it has been to be a free, accessible, high-quality and easy-to-use service. A quick visit to their website gives us a mosaic with icons and clear messages “Join PDFs”, “Split PDFs” and an agile and intuitive step by step to obtain documents with good quality, without limitations or watermarks. We are using iLovePDF in Spanish, but the website is translated into 25 languages ​​so that language is not an obstacle. In 2018 (the oldest capture saved on Archive.today) also. They also do not market with the data: Marco Grossi details that as a European firm they are governed by the GDPR and that all PDFs are deleted within two hours, without anyone being able to access them. In addition, he explains that they have ISO 27001 certification. In the beginning they financed themselves by advertising, but according to their CEO that is very risky. How iLovePDF Makes Money. So since 2014, in addition to the free options, they offer subscription services, so that advertising generates residual income. They are a small company, but they provide service to those people who visit their website, which we have already seen are many. That is why the Barcelona native explains that “we only need a very small percentage of users who pay to finance us.” 80 – 90% of your income they come precisely from its premium subscriptions, aimed at companies. The rest comes from an advertising banner that, my servant who has been using the service for so many years that she does not remember, nor did she remember it. The cost of being premium It is 5 euros per month and access to extras such as digital signatures or getting rid of ads, but it is totally dispensable: its founder details that the free version is enough for 99.9% of those who use us. They are not for sale. Marco Grossi is not a wolf of Wall Street: he himself admits that he never had an entrepreneurial spirit and that he does not open purchase proposals, something similar to the VLC project and that has turned both platforms into memes of saints or heroes on social networks like X/Twitter. Being a self-financed company allows Marco and his team to maintain their philosophy and reject offers. Although its history is meteoric considering its 15 years of life, the CEO speaks of sustained business growth and that they will never hire 200 people in a year to have to close. Their staff turnover is very low, but solid: they want to replicate their model with their counterpart for images, iLoveIMG. In Xataka | In 1990, a company in Barcelona came up with a crazy and visionary idea: talking on your cell phone while you’re stuck in traffic. In Xataka | In 1901, a Spanish man had one of the ideas of the century: invent the remote control before television

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.