Summary judgment was recently granted for defendants based on fair use in two copyright infringement actions challenging the training of large language models (LLMs), one against Meta relating to its Llama LLMs,[1] and the other against Anthropic relating to its Claude LLMs.[2] The decisions bode well for the continued development of the generative AI industry, and therefore for the semiconductor industry, which is building out the infrastructure and higher layers of the generative AI tech stack.
In both cases, authors challenged the unauthorized downloading of their copyrighted works and their copying and use for training LLMs, and in Anthropic’s case, also the creation of a general-purpose digital library. Neither case involved challenges to the LLMs’ outputs.
LLM Training
Training of an LLM involves the use of an enormous number of texts (including, for Claude and Llama, millions of books), which are copied in a multistep process that starts with each text being translated into short sequences of words and punctuation called “tokens,” which are the units on which training is performed. Training then involves the use of a statistical language model to learn patterns from these “tokenized” texts, including predicting the next word in a sequence, given the context from the preceding words, and then repeating the process. The prediction is compared to the original, and the statistical model is accordingly adjusted so that next time it is more likely to predict correctly. The statistical language model operates through the use of “vectors,” which are a sort of multi-dimensional matrix that captures the relatedness (called “weights”) of different words, grammar patterns, or story themes. At a general level, the Anthropic court described training as using the authors’ works to “iteratively map statistical relationships between every text-fragment and every sequence of text-fragments so that a completed LLM could receive new text inputs and return new text outputs as if it were a human reading prompts and writing responses.”
Copyright Law and Fair Use
The policy behind the 1976 Copyright Act is to promote the progress of science and the arts through encouraging authors to create new creative works. Section 106 of the 1976 Copyright Act grants a copyright holder exclusivity with respect to enumerated actions, such as reproduction, preparation of derivative works, and distribution of copies. It does not grant a monopoly over all uses of the copyrighted work. Section 107 of the Copyright Act provides the affirmative defense of “fair use” for acts otherwise infringing the exclusive rights of a copyright holder, the test for which includes the following four factors:
(1) The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) The nature of the copyrighted work;
(3) The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) The effect of the use upon the potential market for or value of the copyrighted work.
Fair use is an affirmative defense that is applied holistically and has been described as an “equitable rule of reason”.[3] Courts have typically viewed the first and fourth factors as the most significant, with the fourth particularly important.
The Anthropic Decision
The materials used by Anthropic included millions of books downloaded from pirated sources, and millions of print books that Anthropic purchased and scanned into digital form with machine-readable text. This was both for the purposes of creating a general research library for potential future use and for training Claude.
Judge Alsup bifurcated his analysis into the use of books for training the LLM and the use of books to build a central library. He held that both the use for training and the digitization of purchased books to build a central library were fair use, but the use of pirated books to build a central library was not. He made clear that summary judgment did not extend to future copies made from the central library that were not used for training LLMs.
With respect to the first factor, Judge Alsup held that the purpose and character of using the copyrighted works to train LLMs to generate new text was “quintessentially transformative.” The use was not simply to memorize and replicate the works it trained on, but “like a reader aspiring to be a writer” to learn from them and create something different. Accordingly, the first factor weighed in favor of fair use for the training copies.
With regard to copies used to build the central library, Judge Alsup bifurcated his analysis into the pirated copies and those Anthropic purchased in print and then digitally converted. He held that the latter group, which facilitated storage and searchability and did not result in new copies being shared with third parties, was transformative. On the other hand, Judge Alsup held that the use of the pirated works was “inherently, irredeemably infringing,” and their use to build a research library was not transformative. Judge Alsup distinguished other decisions, including where copies were unavailable for purchase or loan, copies were transformed into a significantly different form, or the defendant already possessed authorized copies.
Judge Alsup held that the second factor — the nature of the copyrighted work — weighed against fair use because the works at issue involved expressive content, which were entitled to greater protection under copyright laws than factual works.
Judge Alsup held that the third factor — the amount and substantiality of the work used — involved an assessment of whether the amount of copyright-protected material was reasonable in relation to the purpose for copying it. The key to the analysis was not how much text was copied, but how much was made accessible to the public. With respect to training, Judge Alsup held that while the entire books were used, there was no allegation that the material was made available to the public as output. He found that the third factor favored fair use for training because of the large amount of data that Anthropic reasonably needed for training its LLMs. With respect to building a central library, Judge Alsup held that the third factor favored fair use for the purchased copies, but against fair use for the pirated copies, given that Anthropic had no right to hold them at all.
Judge Alsup held that the fourth factor — market dilution — also favored fair use regarding training LLMs. He held that the fourth factor focuses on the extent to which the challenged use acts as an actual or potential market substitution for the copyrighted work. Judge Alsup noted that the authors conceded that the LLMs did not produce exact copies or infringing knockoffs of the authors’ works. Instead, the authors argued that the LLMs would “result in an explosion of works competing with their works.” Judge Alsup analogized the plaintiffs’ argument to a complaint that “training schoolchildren to write well” would also result in an explosion of competing works and held that this “is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition” (citing Sega Enterprises Ltd. V. Accolade, Inc., 977 F.2d 1510, 1523-24 (9th Cir. 1992)). Judge Alsup also rejected the plaintiffs’ arguments that training LLMs would harm an emerging market for licensing work to train LLMs, holding that the Copyright Act does not entitle plaintiffs to exploit such a market that could develop.
Judge Alsup held that the fourth factor was neutral with respect to the purchased library copies that were converted to digital form and pointed against fair use for the pirated works, given that pirated copies “plainly displaced demand” for the plaintiffs’ books.
Judge Alsup, weighing all the factors, thus granted Anthropic’s motion for summary judgment on the issue of fair use with respect to the training copies and books legitimately purchased to build a digital library, but denied summary judgment for Anthropic on the pirated copies, reserving the decision for trial.
The Meta Decision
The Meta decision involved an action by 13 authors against Meta for downloading their works from so-called “shadow libraries” of pirated works and using them to train Meta’s LLM. A key difference between the two decisions was the primary weight that Judge Chhabria gave to the fourth factor and his views, expressed in a lengthy dictum, that in many cases, LLM conduct may fail the fair use test because LLMs often “dramatically undermine the market” for the materials on which they train. By way of example, Judge Chhabria speculated that an LLM capable of producing endless books about how to take care of a garden could greatly diminish the market for human-authored garden books. He indicated that Judge Alsop’s Anthropic decision was overly focused on the transformative nature of generative AI (the first factor in the fair use analysis), “while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on” (the fourth factor). Judge Chhabria, therefore, appeared to endorse a market dilution argument that, based on Sega, Judge Alsop flatly rejected. This theory was also recently supported by the U.S. Copyright Office in its May 2025 report “Copyright and Artificial Intelligence,” albeit acknowledging the “uncharted territory.” Judge Chhabria raised a number of questions that were implicated in a market dilution analysis, including whether Llama was capable of generating books, and if so, what type of books, what impact it would have on competition, and what the impact on the market for plaintiffs’ books would be where Llama could use their books for training versus being unable to use them.
Both judges rejected another argument concerning the fourth factor that the unauthorized training of LLMs harmed the market for licensing books for LLM training. Both courts held that this was not the type of market that the Copyright Act entitles the plaintiffs to exploit.
Regarding the first factor, Judge Chhabria also ultimately agreed that the LLMs’ use was transformative, which is key to finding that the first factor favors fair use. But Judge Chhabria took a different approach from Judge Alsup regarding whether the analysis should focus on LLM training as the sole “use.” Judge Chhabria rejected the plaintiffs’ attempt to bifurcate the analysis into Meta’s downloading of the books and use of the books for LLM training, stating that the downloading must be considered in light of the ultimate purpose of LLM training. Judge Alsup permitted a bifurcated analysis, albeit with respect to building a library, as opposed to simply downloading. Using this bifurcated approach, Judge Alsup held that the use of pirated works in the library weighed against fair use. Judge Chhabria, on the other hand, just considered the use of shadow libraries in connection with his unitary analysis and dismissed its significance. Judge Chhabria held that while it was relevant to the issue of bad faith, and could have been significant if Meta’s downloading had been a part of a peer-to-peer file sharing that had helped to perpetuate the shadow libraries, that was not the case here.
What Are the Implications for the Future Development of LLMs?
There is clear recognition of the significant transformative nature of LLMs, which is an important factor favoring fair use. One weak spot for future decisions is Judge Chhabria’s endorsement of a market dilution test. But this endorsement should be considered in light of the associated questions he raised. Importantly, this is an inquiry heavily dependent on the nature of the market. It is a safe guess (for now) that most users of LLMs are not writing novels, so the “explosion” of competing, LLM-generated novels may end up being more of a theoretical concern. But for other works, for instance, news articles, biographies, and other nonfiction that can be quickly produced en masse by LLMs, Judge Chhabria suggested that there may be market dilution concerns. Judge Chhabria’s dictum also applies outside of text-based works. For instance, an LLM training on a specific songwriter’s catalogue could produce works diluting the market for that artist’s music or any genre uniquely associated with that artist, disincentivizing the artist and potentially others to continue making music in that space. Appropriate guardrails could limit the exposure to market dilution claims, should the market dilution theory gain judicial traction.
Another takeaway from the decisions is that the use of pirated works in connection with training should be avoided. In Anthropic, the fact that the books were pirated weighed heavily against fair use. And in Meta, Judge Chhabria also left open the possibility that use of pirated works could be relevant to a fair use analysis.
A third takeaway is that it was important in both decisions that the LLMs could not reproduce more than very short passages from the training materials. So LLMs should continue including guardrails that prevent memorization and regurgitation of extensive passages of training materials. For instance, Judge Chhabria’s decision emphasized how Llama was configured to return no more than 50 words from any given training source.
A related point is that the cases did not involve outputs. Consequently, the decisions do not address the situation where an LLM produces an unauthorized replica of a copyrighted work, whether through a generative process or memorization.
As indicated above, the decisions do not provide a compelling reason to put the brakes on the generative AI industry, nor do markets seem to have viewed them that way. The continued growth will drive further demand for the semiconductor products needed to support that growth. Moreover, even if copyright infringement were found in a future case, the risk of secondary liability for chipmakers seems trivial given available defenses, such as those based on the existence of non-infringing uses.
[1] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-VC (N.D. Cal. June 25, 2025)
[2] Bartz v. Anthropic PBC, No. 3:24-cv-05417-WHA (N.D. Cal. June 23, 2025)
[3] Google LLC v. Oracle Am., Inc., 593 U.S. 1, 19 (2021).