In Andersen v. Stability AI Ltd., the court allowed the claim to proceed despite the fact that the plaintiff, who claims to hold copyrights in over 200 individual works, could not identify which specific works from her collection were used to train the AI model. The order was issued in response to the defendant’s motion to dismiss and therefore carries limited precedential weight, but if it stands and the court’s reasoning is adopted by other courts, it could make it easier for copyright holders to bring copyright infringement claims against AI developers.
Background
The lead plaintiff in the case, Sarah Andersen, is a full-time cartoonist and illustrator who owns copyright registrations for 16 collections of her works. She and her co-plaintiffs made some of their artwork available on the website DeviantArt, the self-described largest online social network for artists and art enthusiasts.
The allegations primarily revolve around the actions of Stability AI Ltd., which owns the text-to-image generative AI model Stable Diffusion. Like many generative AI models, the Stable Diffusion model was trained on a massive data set, allegedly over five billion images that were scraped from the internet, including images from DeviantArt belonging to the plaintiffs. The model studies the images and other content in its data set to produce new, original content in response to user prompts.
In addition to the unauthorized use of their copyrighted works in the AI data set, the complaint also alleges that Stability and other co-defendants violated the plaintiffs’ publicity rights by using their names without authorization to promote the generative AI products. For example, the complaint alleges that the defendants advertised the ability of their systems to generate artwork “in the style” of plaintiffs’ works and suggests that Stable Diffusion allowed users to create infringing works by referencing specific artists’ names in their prompts (e.g., “create a cartoon in the style of Sarah Andersen”).
The Court’s Order
Most notably, the court concluded that Andersen did not need to identify which of her specific works were used as part of the AI data set to plausibly claim copyright infringement. The court found her allegations plausible merely because the dataset at issue is gigantic (five billion images) and because a third-party website, www.haveibeentrained.com, purportedly confirmed that at least some of her works had been used for AI training. The outcome on this point is significant because it is often difficult for plaintiffs to know which specific works have been used for training. Here, the court concluded that such specific allegations are not necessary, at least when the data set at issue is gigantic and there is some plausible third-party evidence that the artists’ works were included.
Stability has not yet raised the affirmative defense that its use of third-party copyrighted works to train an AI model qualifies as “fair use” under the Copyright Act. It is likely to do so as the case progresses.
The court dismissed the plaintiffs’ publicity claims, concluding that they did not provide enough facts to plausibly allege that the three named plaintiffs specifically were named in defendant advertisements, but it left the door open by allowing the plaintiffs to amend and resubmit their claims.
Conclusion
The court’s ruling provides valuable insight for plaintiffs on how best to structure liability theories, adding yet another perspective to the rapidly evolving framework surrounding AI.