Highlights
- A proposed Senate bill would impose civil liability for the use of potentially copyrightable works to train large language models (LLMs).
- Potential civil penalties for violations include treble damages, punitive damages and attorney’s fees.
- The legislation establishes a private right of action for any individual whose copyrightable works are used in training LLMs.
To be effective, LLM artificial intelligence (AI) systems require an extensive combination of data in a process known as “training.” An early step in training any LLM is providing a corpus of training materials that serve as the base of knowledge the LLM can draw from in response to user prompt. The LLM statistically maps relationships between relevant parts of the training data, algorithmically assigns weights to the relevance of the material, and generates outputs by synthesizing available data.
As discussed here, courts have begun addressing the propriety of using copyrighted works to train LLMs, and the U.S. Senate has now joined that debate. Senators Richard Blumenthal and Josh Hawley introduced the AI Accountability and Personal Data Protection Act on July 21 proposing to “establish a federal tort concerning the appropriation, use, collection, processing, sale, or other exploitation of individuals’ data without express prior consent.”
While the bill’s primary aim is data privacy, it also applies to material that “is generated by an individual and is protected by copyright, regardless of whether the copyright has been registered with the U.S. Copyright Office or any other registration authority.” This broad definition applies not only to registered copyrighted works, but also to any work that is capable of copyright protection.
This proposed legislation would impose civil liability on anyone who, “in or affecting interstate or foreign commerce,” uses covered data — including for training AI system or reproducing it as an AI-generated output — without the individual’s express, prior consent.
The bill creates a private right of action against any person who engages in prohibited conduct or aids and abets such action. The aggrieved party would be entitled to “the greater of (i) actual damages; (ii) treble any profits from the appropriation, or other exploitation of the covered data; or (iii) $1,000.”
Punitive damages, injunctive relief and attorney’s fees are also available under this private right of action. Further, any contractual arbitration provision or waiver of joint action to redress the alleged misappropriation is expressly invalidated.
Takeaways
The newly proposed Senate bill could greatly affect what works can be used as training materials for LLMs. Not only would the bill prohibit the use of personal identifying information in training these models, but it would also impose liability for the use of any potentially copyrighted works.
The bill does not clarify whether it applies retroactively to AI systems already trained on covered data, leaving uncertainty about whether liability can be imposed unless such systems are retrained exclusively with permissible materials.
Given its broad scope and potential impact, this bill is likely to draw extensive debate as it progresses through Congress, and stakeholders should monitor developments closely.