Jump to content

Authors Sue NVIDIA for Training AI on Pirated Books


Recommended Posts

nvidia logoStarting last year, various rightsholders have filed lawsuits against companies that develop AI models.

The list of complainants includes record labels, book authors, visual artists, even the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

“Books3”

Many of the lawsuits filed by book authors come with a clear piracy angle. The cases allege that tech companies, including Meta, Microsoft, and OpenAI, used the controversial ‘Books3’ dataset to train their models.

Books3 was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. The dataset was broadly shared online and added to other databases including ‘The Pile‘, an AI training dataset compiled by EleutherAI.

After pushback from rightsholders and anti-piracy outfits, Books3 was taken offline over copyright concerns. However, for many of the companies that allegedly trained their AI models on it, there are still some legal repercussions to sort out.

Authors Sue NVIDIA for Copyright Infringement

On Friday, American authors Abdi Nazemian, Brian Keene, and Stewart O’Nan joined the barrage of legal action with a copyright infringement lawsuit against NVIDIA. The company, whose market cap exceeds $2 trillion, is mostly known for its GPUs and related software and services, but also has its own AI models.

In a concise class action complaint, filed at a California federal court, the authors allege that NVIDIA used the Books3 dataset to train its NeMo Megatron language models. The models are hosted on Hugging Face where it states that they are trained on EleutherAI’s ‘The Pile’ dataset, which includes the pirated books.

nvidia

Putting two and two together, the plaintiffs conclude that NVIDIA’s models were trained on pirated books, including theirs, without their permission.

“NVIDIA has admitted training its NeMo Megatron models on a copy of The Pile dataset. Therefore, NVIDIA necessarily also trained its NeMo Megatron models on a copy of Books3, because Books3 is part of The Pile,” the complaint reads.

“Certain books written by Plaintiffs are part of Books3 — including the Infringed Works — and thus NVIDIA necessarily trained its NeMo Megatron models on one or more copies of the Infringed Works, thereby directly infringing the copyrights of the Plaintiffs.”

Direct Infringement Damages

Relying on the same logic, the authors accuse the company of direct copyright infringement, noting that NVIDIA copied their books to use them for AI training purposes. Through the lawsuit, the rightsholders demand compensation in the form of actual or statutory damages.

The class action lawsuit includes three authors thus far, but more may be added to the case as it progresses. NVIDIA has yet to respond to the allegations but in light of similar cases, it will likely oppose the claims and/or argue a fair-use defense.

Last month, OpenAI managed to ‘defeat’ several copyright infringement claims from book authors in a somewhat related “Books3” lawsuit. However, the California federal court didn’t review the direct copyright infringement claims in this case, which have yet to be argued in detail at a later stage.

A copy of the class action complaint against NVIDIA, filed by the authors in a California federal court, is available here (pdf)

From: TF, for the latest news on copyright battles, piracy and more.

View the full article

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Our picks

    • Wait, Burning Man is going online-only? What does that even look like?
      You could have been forgiven for missing the announcement that actual physical Burning Man has been canceled for this year, if not next. Firstly, the nonprofit Burning Man organization, known affectionately to insiders as the Borg, posted it after 5 p.m. PT Friday. That, even in the COVID-19 era, is the traditional time to push out news when you don't want much media attention. 
      But secondly, you may have missed its cancellation because the Borg is being careful not to use the C-word. The announcement was neutrally titled "The Burning Man Multiverse in 2020." Even as it offers refunds to early ticket buyers, considers layoffs and other belt-tightening measures, and can't even commit to a physical event in 2021, the Borg is making lemonade by focusing on an online-only version of Black Rock City this coming August.    Read more...
      More about Burning Man, Tech, Web Culture, and Live EventsView the full article
      • 0 replies
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
×
×
  • Create New...