Abstract
This article analyzes U.S. District Judge of the Northern District of California William Alsup’s decision in Bartz v. Anthropic, the first major ruling to address whether training AI models on copyrighted materials constitutes fair use. Building on my prior commentary about the copyright challenges posed by generative AI, this piece explores the court’s reasoning, its implications for AI developers and creators, and the broader policy questions the ruling leaves unresolved.
Background
In my last article, Copyright at a Crossroads: How Generative AI Challenges Traditional Doctrine, I explored the unsettled legal terrain surrounding generative AI and copyright law. As courts, policymakers, and creators grapple with how to apply long-standing principles to rapidly evolving technologies, the question of whether using copyrighted material to train AI models constitutes infringement remains a legal gray area. That is— until recently.
The U.S. District Court’s recent ruling in Bartz v. Anthropic marks the first meaningful judicial opinion to address the legality of training large language models (LLMs) on copyrighted materials. Although the decision is widely regarded as a significant victory for AI developers, it establishes a nuanced legal precedent with farreaching implications that extend well beyond the courtroom.
The Core Dispute: AI Training vs. Copyright Protection
At the heart of Bartz v. Anthropic was the claim by David Bartz, a novelist and screenwriter, that Anthropic’s Claude language model was trained on copies of his copyrighted works without permission. Bartz argued that this use constituted both direct copyright infringement and unlawful circumvention under the Digital Millenium Copyright Act (DMCA). Anthropic responded that its use of publicly available internet data, which may include copyrighted works, was transformative and protected by fair use—a defense that has long played a pivotal role in technological innovation.
This dispute mirrored many of the concerns I highlighted in my earlier piece: whether existing copyright law adequately protects creators when machine learning models copy or reference their work, and whether imposing liability on developers for training data inputs would stifle innovation in AI research.
The Court’s Decision: A Contextual Fair Use Analysis
The court ultimately sided with Anthropic, granting its motion to dismiss the copyright claims on the basis that the ingestion of copyrighted material for the purpose of model training was a fair use under 17 U.S.C. §107. More specifically, while acknowledging that copyrighted works were used, the court emphasized several key factors:
- Purpose and Character of the Use: The court agreed that Anthropic’s use was highly transformative. The works were not reproduced or distributed in any recognizable way, but rather abstracted into statistical representations that enabled language prediction capabilities.
- Nature of the Copyrighted Work: Though the works were creative, this factor was outweighed by the transformative nature of the use.
- Amount and Substantiality: The court recognized that entire works may have been ingested but emphasized that no output from the model replicated those works verbatim.
- Market Effect: The plaintiff failed to show any market harm, either to the original work or any derivative AI training licensing market (which remains largely speculative at this stage).
By grounding its ruling in a fact-specific fair use analysis, the court avoided setting a sweeping rule, while nevertheless signaling a tolerance for AI training practices that rely on publicly available content.
Why This Is a Win—for Now—for AI Developers
For the AI development community, the decision provides much-needed clarity—at least in the short term. By validating the use of internet-scale data for training under fair use, the court preserves the status quo that has underpinned virtually all major foundation models to date. The alternative—a regime where developers must clear rights for millions of individual data points—would be commercially and technologically unworkable for many startups.
Moreover, the ruling recognizes the fundamentally different nature of LLM training as compared to traditional copying or redistribution. This distinction is critical in reinforcing the idea that copyright law must evolve alongside the technologies it seeks to regulate.
A Measured Victory: Risks and Ambiguities Remain
Desire for the victory lap being taken by some within the AI development community, the ruling does not close the book on the legal challenges ahead. The court was careful to note that its opinion was grounded in the specific facts of the case—and left the door open for different outcomes in future cases involving more direct reproduction, clearer market substitution, or stronger evidence of economic harm.
For creatives, the decision certainly feels like a setback. It reinforces a growing sense that copyright law is ill-equipped to offer meaningful protection in the age of AI. The underlying concern—that creative work is being absorbed into opaque systems without attribution, consent, or compensation—remains unresolved.
And for policymakers, the ruling underscores the need for legislative clarity. Courts can interpret doctrine, but they cannot create the kind of systematic licensing infrastructure or statutory exceptions that may ultimately be needed to balance innovation with creator rights. I spoke more about the emerging policy recommendations in my previous article.
Looking Ahead: A Shifting Legal Landscape
Bartz v. Anthropic may be the first, but it certainly won’t be the last major ruling to address AI and copyright. Similar cases—against OpenAI, Meta, and Stability AI—are winding through the courts and may yield diverging results depending on jurisdiction and factual nuance. Meanwhile, agencies like the U.S. Copyright Office continue to reevaluate the boundaries of authorship, infringement, and originality in the context of generative systems.
Copyright law truly stands at an inflection point. Bartz offers an initial blueprint for how courts might apply fair use to AI training, but it also invites more questions than it answers. What obligations do developers have to audit their training data? Can AI models be trained ethically on copyrighted materials if those materials are not reproduced? Should Congress intervene to create a statutory framework for AI training rights and licensing?
These are not questions the judiciary alone can resolve. But in Bartz, we see a legal system beginning to engage—cautiously, contextually, and with an eye toward both enabling innovation to thrive.