Court ruling: AI training can use legally acquired content – The Time Machine

Court ruling: AI training can use legally acquired content

SHARE NOW

A federal district court in California ruled that artificial intelligence companies’ use of purchased, but copyrighted materials for training AI constitutes fair use, while proceeding to a trial regarding appropriate damages for use of pirated materials.

The ruling from the U.S. District Court for the Northern District of California could have a major impact on the relationship between the creators of media and AI, by requiring that AI training material be legally acquired. In most cases, that would involve payment for access.

Three authors sued AI company Anthropic over its acquisition and use of its copyrighted books for its database to, in the company’s words, “store everything forever.” According to the suit, Anthropic pirated over seven million books online for its central database and later purchased copyrighted books — often overlapping with its pirating — tearing off their bindings, scanning every page, storing them for future additional use, and discarding the now-unusable paper originals.

Notably, the authors’ infringement claim was largely that Anthropic’s use of its books constituted unauthorized reproduction.

The court in part ruled in favor of Anthropic, finding that its use of legally acquired, but copyrighted creative works constituted fair use, because the training created a new output that neither competed with nor reproduced the original content.

“The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative,” ruled U.S. District Judge William Alsup. “Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but turn a hard corner and create something different.”

With regards to the reportedly pirated material, however, the court opted to proceed to trial. That could result in major damages to be paid by Anthropic to the creators for what they note is copyrighted, stolen content.

“Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one,” wrote Alsup. “Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.”

“We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness),” wrote Alsup. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages.”