In early 2023, the US Copyright Office (CO) initiated an examination of copyright law and policy issues raised by artificial intelligence (AI), including the scope of copyright in AI-generated works and the use of copyrighted materials in AI training. Since then, the CO has issued the first two installments of a three-part report: part one on digital replicas, and part two on copyrightability.
On May 9, 2025, the CO released a pre-publication version of the third and final part of its report on Generative AI (GenAI) training. The report addresses stakeholder concerns and offers the CO’s interpretation of copyright’s fair use doctrine in the context of GenAI.
GenAI training involves using algorithms to train models on large datasets to generate new content. This process allows models to learn patterns and structures from existing data and then create new text, images, audio, or other forms of content. The use of copyrighted materials to train GenAI models raises complex copyright issues, particularly issues arising under the “fair use” doctrine. The key question is whether using copyrighted works to train AI without explicit permission from the rights holders is fair use and therefore not an infringement or whether such use violates copyright.
The 107-page report provides a thorough technical and legal overview and takes a carefully calculated approach responding to the legal issues underlying fair use in GenAI. The report suggests that each case is context specific and requires a thorough evaluation of the four factors outlined in Section 107 of the Copyright Act:
- The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
- The nature of the copyrighted work
- The amount and substantiality of the portion used in relation to the copyrighted work as a whole
- The effect of the use upon the potential market for or value of the copyrighted work.
With regard to the first factor, the report concludes that GenAI training run on large diverse datasets “will often be transformative.” However, the use of copyright-protected materials for AI model training alone is insufficient to justify fair use. The report states that “transformativeness is a matter of degree of the model and how it is deployed.”
The report notes that training a model is most transformative where “the purpose is to deploy it for research, or in a closed system that constrains it to a non-substitutive task,” as opposed to instances where the AI output closely tracks the creative intent of the input (e.g., generating art, music, or writing in a similar style or substance to the original source materials).
As to the second factor (commercial nature of the use), the report notes that a GenAI model is often the product of efforts undertaken by distinct and multiple actors, some of which are commercial entities and some of which are not, and that it is typically difficult to discern attribution and definitively determine that a model is the product of a commercial or a noncommercial actor.
Even if an entity is for-profit, that does not necessarily mean the modeling use will be considered “commercial.” The work of researchers developing a model for purposes of publishing an academic research paper, for example, would not be deemed commercial. Similarly, a nonprofit could very well develop a GenAI model to license for commercial purposes.
With regard to the third factor (the amount of the copyrighted work used), the report acknowledges that machine learning processes often require ingestion of entire works and notes that the wholesale taking of entire works “ordinarily weighs against fair use.” However, in evaluating the use of entire works in GenAI models, the report offers two questions for analysis:
- Is there a transformative purpose?
- How much of the work is made publicly available?
Fair use is much more likely in instances where a GenAI model employs methods to prevent infringing outputs.
Finally, addressing the fourth factor (market harm), the report acknowledges that the analysis of fair use in GenAI training places the CO in “uncharted territory.” However, the CO suggests that assessment of market harm should address broad market “effects” and not merely the market harm for a specific copyrighted work. The report explains that the potential for AI-generated outputs to displace, dilute, and erode the markets for copyrighted works should be considered because such effects are likely to result in “fewer human-authored” works being sold. This reflects concerns raised by artists, musicians, authors, and publishers about declining demand for original works as AI-generated imitations proliferate. Where GenAI systems compete with or diminish licensing opportunities for original human creators – especially in fields such as illustration, voice acting, or journalism – the fourth factor is likely to weigh strongly against fair use.
Practice Note: Companies developing GenAI systems for text, image, music, or video generation should proceed cautiously when incorporating copyrighted material into training datasets. The CO report casts doubt on assumptions that current training practices are broadly protected under fair use. GenAI developers should consider initiatives such as proactively licensing the content used to train their models. As this fair use issue remains an evolving area of copyright law, companies should be prepared to adjust business models in response to judicial or legislative developments.
On May 10, 2025, the day after the report issued, the White House terminated Registrar of Copyright Shira Perlmutter “effective immediately.” On May 12, 2025, the White House appointed Deputy Attorney General Todd Blanche, who represented Donald Trump during his 2024 criminal trial, as acting registrar. The CO has raised questions about the appointment on the basis that only Congress has the power to fire the registrar or appoint a new one.