Comedian Sarah Silverman and authors Richard Kadrey and Christopher Golden recently filed class-action lawsuits against Meta Platforms (parent company of Facebook) and ChatGPT maker OpenAI (backed by Microsoft Corp.) for allegedly using their copyrighted content without authorization to train artificial intelligence (AI) language models. Meta and OpenAI’s AI language models, known as large language models, aim to replicate human conversation and automate tasks. The lawsuits, filed in San Francisco federal court, highlight the legal challenges faced by developers of chat bots who rely on copyrighted material to create realistic responses.
Silverman, Kadrey, and Golden allege that Meta and OpenAI used copies of the authors’ books without their permission by copying illegal online “shadow libraries” that contain the texts of thousands of books. Specifically, the lawsuit against Meta cites the company’s own research paper “LLaMA: Open and Efficient Foundation Language Models” published in February 2023, which explains the large-language model Meta uses to train its chatbots. This paper outlines the dataset used by Meta, which the authors allege demonstrate that “the copyrighted materials were copied and ingested as part of training,” because “many of the plaintiffs’ books appear in the dataset that Meta admitted to using.” In the suit against OpenAI, the authors allege that summaries of the plaintiffs’ work generated by ChatGPT demonstrate that the bot was trained on their copyrighted content. In fact, a summary of Silverman’s memoir “The Bedwetter” composed by ChatGPT was provided as an exhibit to the complaint. The plaintiffs are seeking awards for damages and injunctive relief on behalf of a nationwide class of copyright owners whose works were allegedly infringed upon.
One hurdle plaintiffs may struggle to overcome is the 2nd Circuit Court’s decision in Authors Guild v. Google. Authors Guild was a copyright case concerning fair use in copyright law and the transformation of printed copyrighted books into an online searchable database through scanning and digitization. Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. The court found that “Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.” While not dispositive, this case addresses the issue of the fair use defense in a context similar to that at issue here, and the court may find fair use by Meta and OpenAI.
These lawsuits highlight the potential legal consequences faced by developers who utilize copyrighted material without proper authorization. Attorneys for plaintiffs assert that “much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works — including books written by plaintiffs — that were copied by OpenAI and Meta without consent, without credit, and without compensation.” These lawsuits are not only intended to assert the plaintiffs’ rights in their copyrighted material, but also have the potential to begin the process of defining the appropriate boundaries for artificial intelligence and what role copyright laws will play in the continued development of AI.