On the heels of our post on Anthropic's mixed ruling in its copyright case, we have witnessed another plot twist in the AI copyrighted-data-usage saga, this time with Meta scoring a significant victory of its own in federal court.
The Battle Lines: Authors vs. Meta's Llama
Several prominent public figures, including comedian Sarah Silverman and Pulitzer Prize winners Andrew Sean Greer and Junot Díaz, have challenged Meta's extensive AI plans, arguing that Meta's Llama AI (whose branding is an amalgam of “Large Language Model”) was trained on their works without authorization.
In discovery, Meta essentially conceded that point. Nonetheless, US District Judge Vince Chhabria just delivered a major victory for Meta, granting summary judgment in its favor. But this decision is different from the Anthropic outcome, with far-reaching implications for both content creators and AI developers.
Why Meta Won (And What the Authors Got Wrong)
A key inquiry in the fair-use analysis is whether the secondary work (here, Llama) is likely to substitute for the original work (the authors' books) in the marketplace and thereby undermine the incentive to create. The judge's reasoning centered on a key oversight by the authors: they failed to establish a vital point in this inquiry—market harm. Consider this: if someone takes your work and creates something entirely different, does it impact your book sales? That's the critical question the authors didn't address. Judge Chhabria pointed out, Look, you might have a case about AI saturating the market, but you never really made that argument in court.
Meta's defense? Their AI isn't trying to be a book. It's not meant to be read like a novel. Instead, it's a "quintessentially transformative" product that learns from books to do something entirely different. It's like saying a chef who learns techniques from cookbooks isn't competing with those cookbooks—they're creating meals, not recipes.
A recent study on Llama indicates this isn't entirely accurate. That study revealed that Llama 3.1 "memorizes certain books, such as Harry Potter and 1984, almost completely." In the Harry Potter example, the study found that Llama 3.1 memorized 42 percent of the first Harry Potter book (HP and the Philosopher's Stone) so thoroughly that it can reproduce verbatim excerpts at least half the time.
The Piracy Problem Nobody's Talking (Enough) About
Here's the surprising part: evidence suggests that Meta may have knowingly downloaded millions of books from shady "shadow libraries" such as Books3, Library Genesis, and Z-Library. These are piracy networks, and internal emails reportedly revealed that Meta brass approved of using "likely pirated" material.
You might think that Llama has run out of grazing space, right? Surprisingly, the judge didn't dwell on this revelation as much as you'd expect. This judicial divergence raises broader questions about how courts should weigh the methods by which AI systems acquire their capabilities against their ultimate functions. When AI systems increasingly serve as intermediaries between humans and information, the legitimacy of their training processes may matter beyond traditional copyright analysis, affecting public trust in AI-mediated knowledge and decision-making.
A Tale of Two Rulings: Meta vs. Anthropic
Our earlier discussion of the Anthropic case is highly relevant to this context. While Meta emerged victorious, Anthropic faced a more complex result. Judge Alsup concluded that training AI using copyrighted books is acceptable (describing it as "exceedingly transformative"), but downloading those books from pirated sites is clearly prohibited. Anthropic remains at risk of a damages trial over its methods of collecting training data. As Judge Alsup put it, "You can't just bless yourself by saying I have a research purpose."
The comparison is notable: two federal judges, two AI companies, similar copyright claims, yet different emphasis and rulings. Judge Chhabria even questioned some of Judge Alsup's reasoning, highlighting ongoing debates on how to address AI and copyright issues. These differing approaches suggest courts are still developing frameworks for evaluating AI systems that will increasingly shape how people access and understand information. The question isn't just whether AI training constitutes fair use, but how the integrity of AI development processes affects their role as knowledge intermediaries.
What This Means for You
If You're in AI:
The good news? Training AI models with copyrighted material appears to have legal support as "transformative use." However, you must ensure you're obtaining that data legally. The era of "download first, ask questions later" is seemingly coming to an end.
If You're a Creator:
Stay hopeful. The Anthropic ruling protects your work from piracy by proxy from LLMs. This ruling offers a clear guide for upcoming cases. The main focus is on demonstrating market harm. Show how AI-generated content might flood your market and impact your sales negatively. Keep records of everything. Get involved with creator advocacy organizations. And display "NO AI TRAINING" notices on your work.
The Bottom Line
These rulings arrive as AI systems transition from experimental technologies to essential infrastructure for information processing and decision-making. We are witnessing the emergence of new legal frameworks for the AI era. Meta's victory doesn't mean AI companies have free rein to use any content they want. It means they need to be smarter about how they acquire data and how they utilize it.
The message is becoming clearer: AI can learn from the world's creative works, but it must adhere to the rules and establish its authority through legitimate means. Think of it like this — AI needs a library card rather than a pirate's map. As these systems become more powerful mediators of human knowledge and creativity, the integrity of their development processes becomes as important as their capabilities.
As these technologies transform how we create and consume content, one thing's certain: the legal battles are far from over. But at least now we're beginning to see where the boundaries might be.
Stay tuned for more updates on AI copyright law as this fascinating legal landscape continues to develop.
“Llama is not capable of generating enough text from the plaintiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data.” U.S. District Judge Vince Chhabria