Within the rapidly evolving artificial intelligence (“AI”) legal landscape (as explored in Proskauer’s “The Age of AI” Webinar series), there is an expectation that Congress may come together to draft some form of AI-related legislation. The focus is on how generative AI (“GenAI”) in the last six months or so has already created new legal, societal, and ethical questions.
Intellectual property (“IP”) protection – and, in particular, copyright – has been a forefront issue. Given the boom in GenAI, some content owners and creators, have lately begun to feel that AI developers have been free riding by training GenAI datasets off a vast swath of web content (some of it copyrighted content) without authorization, license or reasonable royalty. Regardless of whether certain GenAI tools’ use of web-based training data and the tools’ output to users could be deemed infringement or not (such legal questions do not have simple answers), it is evident that the rollout of GenAI has already begun to affect the vocations of creative professionals and the value of IP for content owners, as AI-created works (or hybrid works of human/AI creation) are already competing with human-created works in the marketplace. In fact, one of the issues in the Writers Guild of America strike currently affecting Hollywood concerns provisions that would govern the use of AI on projects.
On May 17, 2023, the House of Representatives Subcommittee on Courts, Intellectual Property, and the Internet held a hearing on the interoperability of AI and copyright law. There, most of the testifying witnesses agreed that Congress should consider enacting careful regulation in this area that balances innovation and creators’ rights in the context of copyright. The transformative potential of AI across industries was acknowledged by all, but the overall view was that AI should be used as a tool for human creativity rather than a replacement. In his opening remarks, Subcommittee Chair, Representative Darrell Issa, stated that one of the purposes of the hearing was to “address properly the concerns surrounding the unauthorized use of copyrighted material, while also recognizing that the potential for generative AI can only be achieved with massive amounts of data, far more than is available outside of copyright.” The Ranking Member of the Subcommittee, Representative Henry Johnson, expressed an openness for finding middle ground solutions to balance IP rights with innovation but stated one of the quandaries voiced by many copyright holders as to GenAI training methods: “I am hard-pressed to understand how a system that rests almost entirely on the works of others, and can be commercialized or used to develop commercial products, owes nothing, not even notice, to the owners of the works it uses to power its system.”
The discussion often centered on the witnesses’ many concerns regarding the negative effects of GenAI on creative professions. The unauthorized use of creators’ original works to train large language models (LLMs) and create AI-generated outputs, for which the creators are not credited nor compensated, was a main concern. Additionally, creators expressed concern that such AI-generated outputs will compete with those creators’ own original works or create other risks to the artistic community. They recognized the challenges that exist in identifying who the creator is when AI and human artistry intersect and expressed concern over deepfakes and the creation of false or misleading content. In addition, the potential marginalization of human creativity as a result of the use of GenAI and the overall stifling of creativity due to the inherent limitations of such technology were also raised as concerns.
The witnesses were quick to highlight the fact that there is currently little-to-no guidance or regulation governing copyrighted works used to train (or copyrighted works that are otherwise crawled by AI developers for training purposes) GenAI models. They recognized that with the ever-changing technology landscape, the intersection between AI technology and IP rights is naturally complex. Despite this, some witnesses urged Congress to reexamine existing laws to see if they provide an adequate legal framework that safeguards IP, copyright and the rights of creators while promoting technological development. The witnesses offered some context on these issues, including:
- Christopher Callison-Burch, Ph.D., an Associate Professor of Computer and Information Science in the School of Engineering and Applied Sciences at the University of Pennsylvania, suggested, in written remarks, that developers have admitted that it may not be possible to seek the consent of all copyright holders when gathering data via web crawling. He did suggest that there are several technical mechanisms being designed to let copyright holders opt-out (e.g., industry standard protocol akin to robots.txt, which is already followed by some organizations that collect training data, such as Common Crawl and LAION, or else the creation of a Copyright Office registry of opted-out works). However, Professor Callison-Burch queried: Is it too late to honor their wishes and allow such copyright holders whose materials have already been collected to opt-out now?
- Professor Callison-Burch also noted that outputs of GenAI products may potentially be infringing, such as when: (1) AI systems “memorize” a copyrighted work it was trained on and is prompted to produce content that mimics it (even if this is a rare occurrence); (2) images are generated using copyrighted or trademarked characters, aka the “Snoopy Problem,” and might be mitigated by a central registry of copyrighted or trademarked characters ; and (3) outputs violate right of publicity laws, whether via an image, voice clone or mimicked musical production (though Professor Callison-Burch asked how to apportion liability for such a right of publicity violation: Would it fall to the AI developer or the user of the AI system?). As to training AI systems, the Professor, in written testimony, suggested that perhaps AI training on copyrighted works should be expressly deemed fair use, but creators should have a mechanism to opt-out of having their data used for training purposes.
- Jeffrey Sedlik, President and CEO of the PLUS Coalition, a member of the Joint Committee on Ethics in AI and a professional photographer, noted that copyrighted images are already licensed via stock agencies (even to AI developers) and such an existing system could work for AI training. Sedlik even noted, in written testimony, that in the photography industry, owners sell licenses for “artist reference,” which involves the use of a photo by an artist to create a derivative work, and suggested that a similar licensing regime could work for the use of copyright images for AI text-to-image diffusion models.
As a result, many of the witnesses suggested that Congress and regulators consider areas of law beyond copyright, such as the right of publicity and trademark law, to effectively address these types of issues. Some witnesses also discussed their support of initiatives that advocate for the rights of creators in the age of AI and highlight the importance of permission, authorization, compensation and transparency in the use of copyrighted works (though at least one witness opined that existing copyright law was flexible enough to balance the need for AI innovation with the rights of creators).
A portion of the hearing revolved around testimony that creators should be compensated to use their copyrighted works in the training of AI technology or otherwise and what that would look like in practice (as witness, Ashley Irwin, President of the Society of Composers and Lyricists and an Emmy Award-winning music director, conductor, composer, arranger and producer, declared in written testimony that AI companies should adhere to the three “C’s”: Consent, Credit and Compensation). The witnesses were eager to express that most copyright owners would be willing to license their works if reasonable compensation could be agreed upon, whether for training purposes, or in the form of a royalty when AI creations are sold. For example, Irwin remarked that creators would be willing to work with AI companies to “unlock new opportunities for collaboration,” while maintaining the vibrancy and economic opportunities within the creative community. The witnesses generally identified themselves and other human creators as the core of the creative process and remarked that in an era where their original creations could be rapidly replicated by machines, financial support to sustain their livelihoods should be available. As a result, Irwin advocated for their creations to be licensed to AI applications and for certain laws to be amended to protect copyright holders (e.g., antitrust laws to allow for collective licensing to AI developers, DMCA CMI reform to clarify violations for AI-created works and an amendment of the Copyright Act to exclude AI-generated works from copyright protection). One witness, Dan Navarro, a Grammy-nominated songwriter, singer, recording artist and voice actor, testified that governments should not create new copyright or other legal immunities that allow AI developers to exploit creators without consent or payment.
Other testimony at the hearing expressed skepticism toward compulsory licenses, noting the challenges of determining the value of AI-generated works, supporting instead negotiations in a free-market context and suggested that, as the law stands now, the fair use doctrine, given its flexibility, is the best tool to balance innovation with creator’s rights.[1] In other submitted testimony, it was argued that, practically speaking, there is no feasible way to compensate every creator of copyrighted content that may have been used to train AI models, but on the other hand, the calls for a statutory or collective licensing regime might be too rigid, too unwieldy, lead to the problem of orphan works and would stifle innovation in the area. The potential use of technology to enhance copyright protection was a brief topic of discussion as well. Image recognition, fingerprinting techniques, and watermarks and tags were a few of the existing technologies mentioned to possibly help in identifying digital works, disclosing their origin and crediting proper copyright owners.
Of course, ongoing discussions will be necessary to find the appropriate solutions that protect creators’ rights while embracing the potential of AI. After all, this hearing was titled, “Part 1”, so perhaps there is a “Part 2” on the horizon. The aim is to strike a balance that fosters AI innovation yet also considers fair compensation, respects consent and safeguards against misuse of creative works.
Most, if not all, of the Congressmen and women expressed gratitude toward the witnesses who spoke at the hearing and emphasized the importance and timeliness of the discussion. One wonders if Congress can come together on some reasonable regulation that deals with the open IP issues inherent in GenAI development and operation. Common sense regulation may not be so easy, however. For example, in his opening remarks, Representative Johnson went on to state that even if Congress required AI systems to seek permission before using copyrighted works for training purposes, it “would only lead to more questions,” including:
- What sort of licensing system should be required?
- What would represent fair compensation for these works?
- What degree of transparency should be built into AI models? And how can we ensure proper credit is attributed to copyrighted works?
- How do we balance the need to protect innovation with the need to protect human content creators?
If Congress is not yet ready to enact regulations, laws and statutory remedies on these AI-related copyright issues, federal agencies, such as the Copyright Office, might issue guidance in the meantime to provide some additional legal clarity, or perhaps ongoing litigation will compel a market-based solution. Stay tuned to see what, if anything, results from this hearing.
[1] Subsequently, in response to this testimony, a former General Counsel of the Copyright Office, Jon Baumgarten, wrote a letter to the Subcommittee stating that he disagreed with one witness’s statement that “the training of AI models will generally fall within the established bounds of fair use,” finding such an assertion to be “over-generalized…and unduly conclusory” and contrary to the recent Supreme Court holding in the Warhol case that the question of fair use may not “be treated in isolation” and requires an analysis of the four fair use factors in each case. Baumgarten also suggested that Congress should not discount the potential of an AI- and data mining-related collective licensing regime that “can in different ways reasonably account for [fair use] in negotiation rates, adjusting projections of copying, defining scope of license, providing or accepting exceptions or otherwise.”
Jonathan Mollod also Contributed to this article.