usatoday24

It’s probably happened to you. You upload a PDF to an artificial intelligence chatbot in the hope that it will summarize a report, extract a table or find a specific piece of information for you in a matter of seconds. And, sometimes, he succeeds. But other times, the result is disconcerting: mixed columns, footnotes embedded in the middle of the text, tables converted into an illegible block or answers that do not faithfully reflect what the document says. The paradox is evident. Systems that already demonstrate clear advances in mathematics and programming They keep stumbling upon something as everyday as a PDF. And there is more than a simple punctual failure.

Change of mentality. Although for us it is a document with well-defined paragraphs, titles and tables, for the system that processes it the situation may be very different. PDF is, first and foremost, a way to visually describe how a page should be rendered. And when a chatbot like Gemini either ChatGPT If you try to work with it, you do not always access an ordered structure, but rather a set of graphical instructions that you must first reconstruct before you can respond coherently. And that difference is better understood when we look at how a PDF “saves” information.

How you actually organize information. Unlike a web page, where the content follows a logical order defined in the code, a PDF can store text as independent fragments placed at specific positions on the page. Many times, the file retains coordinates and placement instructions, but not necessarily explicit relationships between one sentence and the next. This implies that the order in which the text “appears” when extracted does not always coincide with the order in which we read it. If your document includes multiple columns, tables, or overlapping elements, the system must figure out how they fit together. And that deduction is not always trivial.

{“videoId”:”x9hhg44″,”autoplay”:false,”title”:”The TRUTH of AI – This is how ChatGPT 4, DALL-E or MIDJOURNEY works 🤖 🧠 ARTIFICIAL INTELLIGENCE”, “tag”:”webedia-prod”, “duration”:”1173″}

What happens with HTML. On a web page, the content is organized in an explicit hierarchy– There are tags that indicate what a title is, what a paragraph is, what a table is, and how those elements relate to each other. This structure is part of the file itself and makes it easier for other systems to read, index and process it. In a PDF, as we have seen, that semantic layer may not exist or be clearly defined. Therefore, in practice, extracting information from a website tends to be a more predictable process, while doing it from a PDF is more complicated.

So what about OCR? It is the first solution that comes to mind. If the problem is that the text is not well structured or even “drawn” like an image, optical character recognition should convert it into something machine readable. And in part it does. OCR has been used for decades to transform images of words into text, but converting an image to text is not the same as reconstructing the logic of the document. When there are varied elements, the system can recognize each word without knowing exactly how they fit together. The result is not a failure in reading characters, but in the organization of information.

In Xataka

Dario Amodei founded Anthropic because OpenAI didn’t take the risks of AI seriously. Now you are going to give in to those risks

Why don’t we abandon PDF? The answer is more pragmatic than technological. As reported by The Verge citing the person responsible for the PDF Associationthe format became established precisely because it allows a document to look the same today as it would in ten or twenty years, regardless of the device or software with which it is opened. A web page can change depending on the browser, an editable sheet can be modified or overwritten, but a PDF maintains its appearance and visual integrity. That stability is precisely what lawyers, engineers, public administrations and any organization that must maintain reliable records need. The challenge is not to replace the format, but to learn to interpret it better.

Images | Xataka with Nano Bana

In Xataka | Three AIs clashed in ‘War Games’. 95% of them resorted to nuclear weapons and none ever surrendered

(function() { window._JS_MODULES = window._JS_MODULES || {}; var headElement = document.getElementsByTagName(‘head’)(0); if (_JS_MODULES.instagram) { var instagramScript = document.createElement(‘script’); instagramScript.src=”https://platform.instagram.com/en_US/embeds.js”; instagramScript.async = true; instagramScript.defer = true; headElement.appendChild(instagramScript);

–
The news

AI solves equations and chops code, but continues to crash with PDFs: the explanation shows its limits

was originally published in

Xataka

by
Javier Marquez

.

Leave your vote

0 Points

Upvote Downvote

Leave your vote

Leave a CommentCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections