Federal court says AI training on books is fair use, but sends Anthropic to trial over pirated copies

Skye Jacobs

Posts: 693   +15
Staff
What just happened? A federal court has delivered a split decision in a high-stakes copyright case that could reshape the future of artificial intelligence development. US District Judge William Alsup ruled that Anthropic's use of copyrighted books to train its Claude AI system qualifies as lawful "fair use" under copyright law, marking a significant victory for the AI industry.

However, the judge simultaneously ordered the company to face trial this December for allegedly building a "central library" containing over 7 million pirated books, a decision that maintains crucial safeguards for content creators.

This nuanced ruling establishes that while AI companies may learn from copyrighted human knowledge, they cannot build their foundations on materials that have been stolen. Judge Alsup determined that training AI systems on copyrighted materials transforms the original works into something fundamentally new, comparing the process to human learning. "Like any reader aspiring to be a writer, Anthropic's AI models trained upon works not to replicate them but to create something different," Alsup wrote in his decision. This transformative quality placed the training firmly within legal "fair use" boundaries.

Anthropic's defense centered on the allowance for transformative uses under copyright law, which advances creativity and scientific progress. The company argued that its AI training involved extracting uncopyrightable patterns and information from texts, not reproducing the works themselves. Technical documents revealed Anthropic purchased millions of physical books, removed bindings, and scanned pages to create training datasets – a process the judge deemed "particularly reasonable" since the original copies were destroyed after digitization.

However, the judge drew a sharp distinction between lawful training methods and the company's parallel practice of downloading pirated books from shadow libraries, such as Library Genesis. Alsup emphatically rejected Anthropic's claim that the source material was irrelevant to fair use analysis.

"This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites was reasonably necessary," the ruling stated, setting a critical precedent about the importance of acquisition methods.

The decision provides immediate relief to AI developers facing similar copyright lawsuits, including cases against OpenAI, Meta, and Microsoft. By validating the fair use argument for AI training, the ruling potentially avoids industry-wide requirements to license all training materials – a prospect that could have dramatically increased development costs.

Anthropic welcomed the fair use determination, stating it aligns with "copyright's purpose in enabling creativity and fostering scientific progress." Yet the company faces substantial financial exposure in the December trial, where statutory damages could reach $150,000 per infringed work. The authors' legal team declined to comment, while court documents show Anthropic internally questioned the legality of using pirate sites before shifting to purchasing books.

Permalink to story:

 
I feel like this is not final. A person writes a book spending years of very hard work and creativity.
A lot of people buy the book and pay writer's salary.
AI company uses the book to train its AI and now it can sell sub product to EVERY last person on earth.
AI company is rich; writer more often than not is not. If they can repeatedly benefit from someone's hard work, then they should share the profit with the writer.
It is bypassing, bypassing buying the book, allowing people to go around to get to that information, paying to AI company instead of the writer.
I think this is not the end. It sounds extremely unfair. And let's forget about people like J K Rowling. A lot of people do not get multiple chances with writing something good. Every AI company that gets paid for providing services based on content it "learned," must share with every writer whose work they used.
 
Most people in the US (and globally) want to be able to access AI technology, including writers. So, the judge's decision not to render it illegal is a good thing for everyone, especially since there were no true financial damages to the writers (they never can prove damages).

Now, the focus will likely shift from the right to access AI to the right for access to high-quality AI, framed by social goals like prosperity and improving well-being. This shift brings the debate around using unauthorized copies for training into focus. Does access to high-quality AI contribute more significantly, both directly and indirectly, to the quantity and quality of education and assistance for people than the guarantee of monetization from copyright materials? If yes, then even the unauthorized poses of copyright materials could also be considered fair use in the new formed field from the presence of AI.

In reality, after the initial presentation, almost all copyrighted works if they will gonna sell at all they will sell only for a few weeks, with some exceptions perhaps in software. For books, movies, songs and games, what they sell in the first year most times is more than what they sell for the next decade. And usually they sales are benefit (not damaged) from network social effects. I note that because the uncertainty and the maximalist but unrealistic financial expectations and assumptions hidden within the theoretical framework of copyright often don't contribute to practical decisions.
 
Back