Courts Split on Fair Use in LLM Training with Copyrighted Works

Nicholas A. Sarokhanian

Email

612-367-8795

Bio and Articles

Kaitlyn E. Stone

Email

973-775-6103

Bio and Articles

William M. Carlucci

Email

973-775-6107

Bio and Articles

Lyric D. Menges

Email

(310) 284-3774

Bio and Articles

Find Your Next Job !

Experienced Family Law Attorney

LEGAL ASSISTANT II

Specialist: Legal Information Center Research

Explore More Job Openings

Federal Courts Issue First Key Rulings on Fair Use Defense in Generative AI Copyright Claims

by: Nicholas A. Sarokhanian, Kaitlyn E. Stone, William M. Carlucci, Lyric D. Menges of Barnes & Thornburg LLP - Litigation Alert

Monday, June 30, 2025

Print Mail Download info_icon_img

/>i

The courts held that training large language models (LLMs) on copyrighted materials can be “transformative,” a central consideration in the fair use analysis.
However, the judges diverged on the legal significance of that finding, particularly when weighted against potential market harm to authors.
One court found fair use in training LLMs with legally acquired content, but not with pirated materials.
Another court emphasized potential market dilution as a decisive factor, despite granting summary judgment based on evidentiary failings.
Companies developing or deploying generative AI tools should carefully review these rulings to assess risk and refine litigation strategies.

Introduction

Until last week, federal courts handling copyright infringement claims against AI companies had not decided whether developers’ primary legal shield — the “fair use” defense absolving one from liability — could apply in litigation involving generative AI.

Although earlier decisions addressed older, non-generative AI systems, two rulings issued last week by the U.S. District Court for the Northern District of California mark the first substantive summary judgment guidance on this question.

While the AI developers prevailed (at least in part) in both cases, the rulings — issued by different federal judges in that district — revealed a clear tension in how judges view whether copying copyrighted works to train LLMs constitutes fair use. Together, the decisions offer important insight into how, and to what extent, fair use may apply to generative AI training practices.

Background

AI developers across the country have faced copyright infringement claims stemming from both the input or training phase (e.g., feeding LLMs a corpus of data to read) and the output or generative phase (e.g., after the LLMs are trained and a user prompts it to generate new content). In input-infringement cases, plaintiffs typically allege that AI developers infringed on copyrighted works in their corpus of training data. In output-phase cases, the claim is that those same works are reproduced, or substantially copied, through the model’s generative responses.

AI developers have relied on applying the fair use defense for copyright infringement in cases involving either or both of these theories. Courts considering the fair use defense consider four factors:

Purpose and character of the use.
Nature of the copyrighted work.
Amount and substantiality of the portion used.
Effect upon the potential market for or value of the copyrighted work.

Cases involving both infringement theories have been steadily progressing through the court system. Many of these matters involve earlier iterations of LLMs, reflecting how quickly this technology continues to evolve.

In contrast, other decisions involving non-generative AI technologies and the fair use defense have been decided, with judges carefully cautioning readers that their reasoning may not apply to cases involving generative AI tools.

As of last week, we have substantive decisions — issued at the summary-judgment stage, where courts determine whether claims are strong enough to proceed to trial — involving the fair-use defense in cases involving generative AI.

Last Week’s Decisions

June 23, 2025: In a case where the plaintiffs (authors of copyrighted books) accused a major U.S.-based AI company of copyright infringement in the input (training) phase, Judge William Alsup of the Northern District of California partially granted summary judgment, dismissing some claims while allowing others to proceed to trial.

In that case, the plaintiffs challenged two practices while developing its LLMs: (1) destructively scanning the pages of printed books and creating digital copies of them to be kept in the AI lab’s permanent, general purpose “research library,” and (2) using those digital copies — plus pirated and stolen books found on the internet — to train the LLMs.

Judge Alsup held that in the first use, digitizing lawfully purchased books for its “research library,” was fair use, stating that it was sufficiently transformative. He also found that the next step of the training process — using digital copies to train LLMs to generate new text — was “exceedingly,” “spectacularly,” and “quintessentially” transformative, and therefore protected under fair use. He distinguished this use from a prior non-generative AI case, where the system simply replicated the function of the copyrighted work rather than generating new content.

However, Judge Alsup found there was no fair use in the AI lab’s inclusion of pirated and stolen copies of plaintiffs’ books — which were otherwise available for purchase, whether new or used — in its research library, or the subsequent step of training its LLMs on the pirated works.

Finally, Judge Alsup repeatedly emphasized that this case only involved the input or training phase, and not the output or generative phase, which might yield a different outcome when considering fair use.

In sum, claims related to the digitization of lawfully purchased books and the subsequent training of LLMs on those digital copies were dismissed, but the plaintiffs’ claims related to the pirated books will be permitted to proceed to a trial on the merits.

June 25, 2025: In a separate case brought by a group of authors against a different major U.S.-based AI company, Judge Vince Chhabria, also of the Northern District of California, granted summary judgment in favor of the AI company based on the plaintiff’s litigation missteps. As Judge Chhabria said, the plaintiffs “made the wrong arguments and failed to develop a record in support of the right one,” with the result being, “in the grand scheme of things, the consequences of this ruling are limited.”

While Judge Chhabria agreed with Judge Alsup that training LLMs on copyrighted works are transformative, Judge Chhabria disagreed with the legal significance of that and argued that Judge Alsup “blew off the most important factor in the fair use analysis,” which, to Judge Chhabria, is the “harm to the market for the copyrighted work” from the “copying of protected works, however transformative.”

In another interesting difference, Judge Chhabria rejected the plaintiffs’ output-related theory that the LLMs were trained to “emulate certain writers’ styles,” because “style is not copyrightable” and, in any event, the LLMs were trained not to “produce more than 50 words of any of the plaintiffs’ books” even in the face of “adversarial” prompts designed to provoke that outcome.

Judge Chhabria also considered the piracy question differently than Alsup, writing “plaintiffs are wrong that the fact” books were downloaded from “shadow libraries” gave them “an automatic win,” because calling that “piracy” begged the question, as the “whole point of fair use analysis is to determine whether a given act of copying was unlawful.”

Most significantly, Judge Chhabria highlighted three potential market effects that the plaintiffs could have argued:

Claiming LLMs will regurgitate the works (or substantially similar outputs), allowing users to access those works or substitutes for free.
Pointing to a market for licensing works for AI training and contending unauthorized copying harms or precludes the development of that market.
Even if LLMs cannot regurgitate their works or generate substantially similar ones, they can generate works that are similar enough to compete with the originals “and thereby indirectly substitute for them,” as “the harm of market dilution.”

Judge Chhabria wrote “the first two arguments fail,” and that the “third argument is far more promising, but the plaintiffs’ presentation is so weak that it does not move the needle” enough to survive summary judgment.

While not necessary to his ruling, Judge Chhabria warned that the harm of market dilution caused by AI flooding the market with AI-generated books “would reduce the incentive for authors to create,” which is “the harm that copyright aims to prevent.” Judge Chhabria emphasized that generative AI poses a unique risk of market harm because it “can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original works it was trained on,” and no other use “has anything near the potential to flood the market with competing works the way that LLM training does.”

He also wrote that because fair use “is meant to be a flexible doctrine that takes account of ‘significant changes in technology,’” courts “can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create,” and that “it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor — and thus win the fair use question overall — in cases like this.”

Ultimately, Judge Chhabria dismissed the plaintiffs’ “half-hearted argument” of market dilution, while recognizing the “conclusion may be in significant tension with reality” because the result was “dictated by the choice of the plaintiffs made to put forward two flawed theories of market harm while failing to present meaningful evidence on the effect of training LLMs” with “their books on the market for those books.”

Takeaways

These decisions will undoubtedly be relied upon by parties and courts involved in copyright litigation related to generative AI, regardless of the nature of the protected works (books, news articles, music, video, etc.). Neither decision can be taken as a one-size-fits-all rule to predict the outcome of all future copyright claims involving generative AI, but both provide litigants guidance on facts that are the most critical to allege and develop in discovery.

Companies with potential copyright infringement exposure or issues should carefully review these decisions in order to predict how future cases involving generative AI may be decided.