Reddit Sues Anthropic Over Alleged Data Scraping and Privacy Viol

Samuel Cohen

Email

212.896.0663

Bio and Articles

Find Your Next Job !

Specialist: Legal Information Center Research

LEGAL ASSISTANT II

Experienced Family Law Attorney

Explore More Job Openings

Beyond Copyright: Reddit’s Lawsuit Against Anthropic

by: Samuel Cohen of Sheppard, Mullin, Richter & Hampton LLP - AI Law and Policy

Tuesday, June 17, 2025

Print Mail Download info_icon_img

/>i

On June 4, 2025, Reddit, Inc. (“Reddit”) filed suit against Anthropic, PBC (“Anthropic”) in the Superior Court of California, alleging that Anthropic scraped and commercially exploited Reddit user data—including deleted posts—without consent or compensation.[1] Unlike recent enforcement efforts that have centered on establishing copyright infringement liability, Reddit’s complaint brings five causes of action—breach of contract, unjust enrichment, trespass to chattels, tortious interference, and unfair competition—reflecting a strategic choice to deploy contractual and privacy-based claims to address Anthropic’s allegedly unauthorized scraping of Reddit data.[2]

Reddit alleges that Anthropic trained its AI models (e.g., Claude) on public Reddit posts and comments scraped between December 2021 through October 2024.[3] Public statements by Anthropic researchers identify Reddit subreddits—such as r/explainlikeimfive, r/changemyview, and r/WritingPrompts—as “good samples” for fine-tuning training inputs.[4]

According to the complaint, Reddit grants licensed AI partners conditional access to its archive only through a designated “Compliance API” which alerts licensees when content has been deleted by users.[5] AI partners are then contractually required under their licenses with Reddit to cease ongoing use of such material, thereby respecting users’ privacy rights.[6] Anthropic, however, allegedly refused to enter such an agreement yet nevertheless continued unauthorized access to the Compliance API, using the data for commercial purposes, in violation of Reddit’s license terms.[7] Despite Reddit’s technological controls, including robots.txt directives and IP rate limits, Anthropic’s bots are alleged to have bypassed these defenses, generating over 100,000 unauthorized API calls and imposing significant server-capacity costs on Reddit.[8] These documented costs allegedly quantify the tangible economic injury to Reddit’s infrastructure, forming the basis for its claims for trespass to chattels, breach of contract, and unfair competition.[9] At the heart of Reddit’s breach-of-contract claim is Anthropic’s alleged violation of key provisions in the Reddit User Agreement—specifically, the prohibition on “commercially exploit[ing]” Reddit content, the restriction on unauthorized scraping, and the improper access and use of Reddit’s Compliance API to continue using deleted or restricted content without permission.[10]

Reddit’s strategy appears designed to highlight the consequences of using data without a license, while sidestepping unsettled copyright defenses in AI contexts.[11] According to Reddit’s complaint, without a license, Reddit cannot enforce deletion requests, monitor privacy compliance through its Compliance API, or restrict sensitive data (e.g., sexually explicit content) from being included in AI training sets—in contrast to the clear operational boundaries enforced with licensed partners.[12]

While Reddit did not include copyright claims in its complaint, Anthropic could still argue that Reddit’s non‑copyright claims are preempted by the Copyright Act because they concern how Anthropic allegedly “used” and “reproduced” user-generated content, which closely aligns with the exclusive rights of reproduction and distribution federal copyright law.[13] Under the copyright preemption doctrine, state-law claims are invalid if they rest on rights equivalent to those protected by copyright—meaning that breach-of-contract, unjust enrichment, and unfair-competition allegations tied to content use may fail.[14] Tortious interference, however, typically survives preemption because it addresses improper disruption of contractual or business relationships, not copying itself.[15]

For content creators, social platforms, and rightsholders, Reddit’s lawsuit illuminates a crucial reality: that technical restrictions alone may not reliably prevent scraping, commercializing, or misuse of data. While tools like API gating, robots.txt, and rate-limiting are essential and recommended, determined actors may still evade defenses. As a result, platforms should complement technical controls with legally enforceable terms and conditions, formal licensing arrangements (including compliance obligations and takedown mechanisms), real-time-monitoring of API access and usage, documentation of server impact to demonstrate tangible harm, and embedded privacy controls to respect user deletions and data rights. Moreover, having a clear escalation plan—up to litigation—ensures those protections are not just theoretical. As the legal framework for AI training continues to evolve, this case offers unique insight into the importance of proactive governance, technical diligence, and contract-backed enforcement mechanisms to preserve platform integrity and safeguard user trust.

FOOTNOTES

[1] Complaint, Reddit, Inc. v. OpenAI, Inc., No. CGC-25-625892 (Cal. Super. Ct. S.F. Cnty. June 4, 2025), https://redditinc.com/hubfs/Reddit%20Inc/Content/PDFs/Docket%20Stamped%20Complaint.pdf.

[2] See id.

[3] Complaint supra Note 1.

[4] Amanda Askell et al., A General Language Assistant as a Laboratory for Alignment, arXiv (Dec. 9, 2021), arXiv:2112.00861, at 35.

[5] Complaint supra Note 1.

[6] Id.

[7] Id.

[8] Id.

[9] See id.

[10] Complaint supra note 1.

[11] Id.

[12] Id.

[13] See id.