Anthropic to Shell Out $1.5 Billion for Copyrighted Books Used in Claude AI Education - estimated compensation of approximately $3,000 for each book infringement

Anthropic Settles Class-Action Lawsuit Over Use of Pirated Books in AI Training

In a landmark decision, Anthropic, the company behind Claude AI, has agreed to pay at least $1.5 billion to settle a class-action lawsuit over the use of pirated books in training its large language models. This settlement, led by authors Andrea Barta, Charles Graeber, and Kirk Wallace Johnson, is the largest publicly disclosed AI copyright settlement to date.

The lawsuit accused Anthropic of downloading hundreds of thousands of copyrighted books from torrent-based sources. Among the accusers are Swiss authors who hope to receive compensation through a class action settlement offering $3,000 per book.

Judge Alsup's ruling suggests that the extent of statutory damages owed to rights holders might be affected by Anthropic's subsequent purchase of the copyrighted book. However, Judge Alsup stated that Anthropic buying a copy of a book it earlier stole off the internet does not absolve it of liability for the theft.

Anthropic's settlement is not an admission of doing anything wrong, but it sets a new benchmark for data liability in generative AI development. The company will be required to delete the infringing data, but there is no indication that the court will force Anthropic to delete or retrain its models.

The case does not challenge the broader legality of training AI on public or lawfully obtained content, but it highlights the legal risk and potential financial cost of using pirated material. The issue of training AI on public or lawfully obtained content is still working its way through the courts.

If models trained on pirated data face lawsuits or forced retraining, developers may need to start over using clean, licensed datasets. Redoing training runs using clean, licensed datasets could consume millions of GPU hours, potentially increasing compute demand. This speculation suggests a potential boost in demand for high-performance GPUs due to the need for revalidation of AI models trained on clean, licensed datasets.

Nvidia's H100 and upcoming Blackwell GPUs, as well as AMD's MI300X and HBM3e providers, could benefit as courts force labs to scramble and revalidate their models.

OpenAI has also settled with publishers in a separate matter, but the details of these deals are confidential. Users can stay updated on the latest news, analysis, and reviews in the AI and tech industry by following Tom's Hardware on Google News or adding it as a preferred source.

Users are encouraged to click the Follow button to receive updates from Tom's Hardware in their feeds. The ruling's implications for related matters regarding models trained on pirated data are yet to be seen.

Anthropic to Shell Out $1.5 Billion for Copyrighted Books Used in Claude AI Education - estimated compensation of approximately $3,000 for each book infringement