How Generative AI Challenges Traditional Copyright Doctrine

Bernard K. Elegbede

Email

Bio and Articles

Find Your Next Job !

Associate Litigation Attorney

Specialist: Legal Information Center Research

Experienced Family Law Attorney

Explore More Job Openings

Copyright at a Crossroads: How Generative AI Challenges Traditional Doctrine

by: Bernard K. Elegbede of Washington University School of Law

Monday, June 30, 2025

Print Mail Download info_icon_img

/>i

Abstract

As courts grapple with whether training AI on copyrighted material constitutes fair use, companies have adopted divergent compliance strategies. This piece analyzes recent litigation, industry practices, and legislative proposals aimed at balancing innovation with creator rights.

Background

The development of generative artificial intelligence (AI) systems requires access to large-scale datasets critical for enhancing model performance and mitigating algorithmic biases. Yet, the reliance on data has brought AI development into direct tension with copyright law, as many models are trained on copyrighted content without obtaining authorization from rights holders. In the absence of judicial clarity, companies have adopted divergent approaches to navigating this tension—ranging from rights-cleared datasets to permissive scraping practices—each with different implications for compliance, innovation, and liability.

This article explores the current legal uncertainties surrounding generative AI and copyright law, analyzes how companies are interpreting fair use at both the training and output stages, and surveys emerging policy responses aimed at closing the gap between technological capability and legal doctrine.

Output-Phase vs. Training-Phase Fair Use

Once a model produces a result, that output is typically assessed under traditional copyright principles. If a generated work is “substantially similar” to a protected original, the four-factor fair use test applies: the purpose and character of the use, the nature of the work, the amount and substantiality of the portion used, and the effect on the market. For instance, in The New York Times Co. v. OpenAI, the plaintiffs allege that ChatGPT occasionally reproduces Times content verbatim—raising the question: when does an AI-generated output cross the line into infringement? Courts may turn to precedents like Authors Guild v. Google, Inc., where the Second Circuit held that digitizing and indexing books for search was transformative. But whether a generative model’s output—particularly when mimicking style or structure— meets the same standard remains an open question. More novel is the issue of whether using copyrighted works for training itself constitutes fair use. In its May 2024 report, the U.S. Copyright Office suggested that converting expressive works into nonhuman-readable data (such as statistical weights) may qualify as “transformative,” particularly in research contexts. The Office analogized model training to the Google Books decision, which permitted large-scale digitization for non-expressive, analytic uses. However, this position is contested. Critics argue that training models on copyrighted material at scale—even if transformed—may exceed fair use when outputs echo identifiable elements of original works. In Kadrey v. Meta Platforms, Inc., the court declined to treat a language model as a derivative work in the absence of specific outputs mirroring the plaintiffs’ books. Notably, the case remains at the motion-to-dismiss stage, and future rulings may further clarify these boundaries.

Divergent Industry Practices and Policy Proposals

Industry responses to legal uncertainty vary widely. One prominent image-generation platform places infringement liability on users via disclaimers in its terms of service and employs watermarking to support attribution. Other models are trained on the LAION dataset, a web-scraped repository filtered by open license tags, but face criticism over provenance inaccuracies and limited enforceability of opt-outs. These divergent practices highlight a broader legal concern: that AI systems may replicate a creator’s distinctive style without consent or compensation. In response, some stakeholders have supported legislation such as the Preventing Abuse of Digital Replicas Act (PADRA), which would grant artists a private right of action when AI is used to imitate their unique style for commercial gain. PADRA is narrowly tailored, requiring 2 both “demonstrable intent” and a “commercial purpose”. The proposed legislation is designed to close the legal gap left by traditional copyright law in addressing stylistic appropriation. Outside of PADRA, other proposed solutions with broad support include text-and-data mining exemptions for research, opt-out registries for creators, and collective licensing regimes modeled on those used in the music industry.

Conclusion

The intersection of generative AI and copyright law remains unsettled and rapidly evolving. Rights-cleared development models—featuring licensed datasets, proactive moderation, and provenance tools—offer one compliance-conscious approach, but they are not a cure-all. Courts will need to define how the fair use doctrine applies to both training and output, while legislators continue to explore targeted reforms like PADRA. Ultimately, the legal framework for generative AI is unlikely to emerge from a single ruling or statute. Instead, it will be shaped by a mosaic of court decisions, negotiated licensing regimes, industry norms, and legislative innovation. The challenge ahead is not just to balance innovation and rights—but to build a system that does so predictably, fairly, and at scale.