usatoday24

A language model for AI needs input if it is to be trained to be more accurate and effective. The issue is how the information is obtained and whether there is an ethical way to do it that is profitable for the technology company in power. There is no doubt that the preferred option for companies has been to use all possible physical and digital content without anyone’s permission. There is also evidence.

A judicial leak reveals that Anthropic invested tens of millions of dollars in acquiring and digitizing literary works without permission from the authors. According to account Washington Post, the project, internally called “Panama”, was part of a frenetic race among big technology companies to accumulate massive data to train their artificial intelligence models.

How it all started. The Panama Project was launched by Anthropic in early 2024. According to internal documents revealed per the Washington Post, the goal was to “destructively scan every book in the world.” Furthermore, these documents also explicitly state that the company did not want anyone to know that they were working on it.

Anthropic has taken Apple's strategy against Microsoft to the Super Bowl: making using the rival look ridiculous

In about a year, the company spent tens of millions of dollars buying millions of books, cutting their spines with hydraulic machines and scanning their pages to feed the AI models that power Claudeits star chatbot. According to the media, the books, once digitized, ended up being recycled.

Because has come to light. The details of the project have been revealed in a lawsuit for infringement of rights copyright filed by literary authors against Anthropic. Although the company agreed to pay $1.5 billion to close the case in August 2025, a district judge decided to make more than 4,000 pages of internal documents public last week, exposing the entire operation.

They are not the only ones. Court documents reveal that other technology companies such as Meta, Google and OpenAI had also participated in this race to obtain massive information to train their models. According to revealed According to the documents, an Anthropic co-founder theorized in January 2023 that training AI models with books could teach them “how to write well” instead of imitating “low-quality internet slang.”

On the other hand, an internal Meta email from 2024 described access to a digital library of books as “essential” to be competitive with rivals in the race to dominate AI. However, the documents revealed by the media also show how Meta employees expressed concern on several occasions about the legality of downloading millions of books without permission. An internal email from December 2023 indicates that the practice had been approved after being “escalated to MZ,” apparently referring to CEO Mark Zuckerberg.

According to court records to which the media has had access, the companies did not consider it “practical” to obtain direct permission from publishers and authors. Instead, they found ways to mass-acquire books without the writers’ knowledge, including downloading unauthorized copies from third-party sites.

Chat logs from April 2024 show an employee asking why they were using servers rented from Amazon to download torrents instead of Facebook’s own. The answer: “Avoid the risk of tracing” the activity back to the company.

Intel refuses to be left out of the AI race. Your next move points directly to NVIDIA's territory

Data torrent. The documents to which the Washington Post has had access also they test that Ben Mann, co-founder of Anthropic, personally downloaded over 11 days in June 2021 a collection of books from LibGen, a gigantic library of copyrighted content. The outlet further revealed that, a year later, in July 2022, Mann celebrated the launch of the ‘Pirate Library Mirror’ website, which boasts a massive database of books and openly claims to violate copyright laws. “Just in time!!!” Mann wrote to other Anthropic employees, according to the outlet.

Anthropic stated in legal documents that it never trained a revenue-generating business model using LibGen data nor did it use Pirate Library Mirror to train any full model.

Anthropic’s legal solution. According to point the medium in its article, faced with the legal risk, Anthropic changed its strategy. The company hired Tom Turvey, a Silicon Valley veteran who had helped create the project Google Books two decades earlier. Under his direction, Anthropic considered purchasing books from libraries or secondhand bookstores, including New York’s iconic Strand bookstore.

OpenAI's entire financial strategy depended on achieving a monopoly with ChatGPT: the opposite is happening

The company ultimately ended up buying millions of books and stacking them in a giant warehouse, often in batches of tens of thousands, according to court filings. The Washington Post assures In addition, the company worked with used book sellers in the United Kingdom. A project proposal mentions that Anthropic sought to “convert between 500,000 and two million books in a six-month period.”

What the law says. Most legal cases against AI companies are still ongoing, but the media mention two court rulings that have considered that the use of books to train AI models without permission from the author or publisher may be legal under the “fair use” doctrine of copyright.

"There was never a commitment": Suddenly, Jensen Huang is no longer sure that NVIDIA is going to invest 100 billion in OpenAI

In June 2025, District Judge William Alsup determined that Anthropic had the right to use books to train AI models because they process them in a “transformative” way. He compared the process to teachers “teaching schoolchildren to write well.” That same month, Judge Vince Chhabria ruled in the Meta case that the authors had not shown that the company’s AI models could harm the sales of their books.

In the Anthropic case, the physical book scanning project was considered legal, but the judge determined that the company may have infringed copyright by downloading millions of books without authorization before launching Project Panama.

The final agreement. Instead of facing a trial, Anthropic agreed to pay $1.5 billion to publishers and authors without admitting guilt. According to point According to the media, authors whose books were downloaded can claim their share of the settlement, estimated at about $3,000 per title.

Cover image | Emil Widlund and Anthropic

In Xataka | If AI is going to leave us without jobs, in the United Kingdom they are already seriously discussing the solution: a universal basic income

Leave your vote

0 Points

Upvote Downvote

Anthropic wanted to secretly scan and then destroy millions of books to train its AI. It hasn’t been so secret

Leave your vote

Leave a CommentCancel reply

Leave your vote

Leave a CommentCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections