Meta Platforms Empowers Llama AI Training Despite Legal Turmoil Over Alleged Copyright Infringement with Pirated Books

Meta Platforms

Meta Platforms Faces Lawsuit Over AI Training

Meta Platforms

Meta Platforms, the parent company of Facebook and Instagram, is currently embroiled in legal challenges over allegations that it knowingly used thousands of pirated books to train its AI models. Despite being warned by its own lawyers about potential legal risks, Meta Platforms proceeded with its actions.

Read More

A recent filing has consolidated two separate lawsuits against Meta, both initiated by notable figures such as comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon, and other authors. These content creators allege that Meta Platforms utilized their works without permission to train its AI language model called Llama.

A potentially significant piece of evidence presented in the new filing includes chat logs from a Meta-affiliated researcher discussing the acquisition of the dataset in a Discord server. These logs reveal a conversation between researcher Tim Dettmers and Meta’s legal department, shedding light on concerns about the legality of using book files as training data.

 

In the chat logs, Dettmers discusses the back-and-forth with Meta’s legal department, indicating that, for legal reasons, they are unable to use the dataset in its current form. Meta’s lawyers had reportedly advised against using the data or publishing models trained on it. The discussion highlights potential worries about using books with active copyrights, with researchers arguing that training on the data should fall under fair use—a U.S. legal doctrine allowing certain unlicensed uses of copyrighted works.

 

The tech industry has faced a series of lawsuits this year from content creators accusing companies of using copyright-protected works to build generative AI models. Successful cases could raise the cost of developing these data-intensive models, requiring AI companies to compensate artists and authors for the use of their works.

 

Simultaneously, new provisional rules in Europe are set to regulate artificial intelligence, potentially forcing companies to disclose the data used to train their models. This could expose them to additional legal risks in the evolving landscape of AI and copyright.

Meta Llama Models and Market Dynamics

Meta Platforms

Meta Platforms had previously released the first version of its Llama language model, acknowledging the use of datasets like “the Books3 section of ThePile,” which reportedly contains 196,640 books. However, the company did not disclose the training data for its latest model, Llama 2, released for commercial use in the summer. Llama 2 is free for companies with fewer than 700 million monthly active users and is perceived as a potential disruptor in the generative AI software market.

Read More (Meta)

Related posts

Leave a Reply

Your email address will not be published. Required fields are marked *