AI Policy & Regulation

Reddit Takes Legal Aim at Anthropic Over AI Data Scraping

By Eli Grid

Posted on June 10, 2025

Reddit Takes Legal Aim at Anthropic Over AI Data Scraping

Photo by Brett Jordan

Reddit is drawing a legal line in the sand, and it’s pointed directly at Anthropic.

The platform has filed a lawsuit accusing the Claude AI maker of scraping user content without consent, raising serious questions about how AI companies gather training data. Could this case change how AI firms access web content going forward?

If Reddit wins, the ripple effects could be massive for tech giants training models on user-generated content.

What’s the News?

In a lawsuit filed in California state court, Reddit alleges that Anthropic made over 100,000 unauthorized requests to its servers to extract user content. The data was allegedly used to train Anthropic’s large language models, including Claude.

The complaint states that Anthropic ignored Reddit’s robots.txt protocol, a widely accepted web standard meant to prevent automated scraping. Reddit claims this wasn’t a simple oversight. Instead, it frames Anthropic’s actions as deliberate, stating the company “bypassed technical barriers and violated terms of service.”

Notably, Reddit offers licensed data access to firms like OpenAI and Google. These agreements come with guardrails: privacy protections, takedown compliance, and usage limits. According to Reddit, Anthropic declined to pursue such a license and instead scraped the platform directly—dodging fees and disregarding user protections.

The platform also claims that Claude, Anthropic’s AI chatbot, has repeated Reddit content nearly verbatim in some responses. That includes deleted user posts, which Reddit argues should never have been stored or surfaced by an AI. This, it says, shows a clear absence of responsible data filtering.

A 2021 paper co-authored by Anthropic CEO Dario Amodei cited Reddit as a valuable training source for LLMs, further strengthening Reddit’s argument that the data scraping was intentional and strategic.

Anthropic has denied wrongdoing, saying it disagrees with the claims and plans to defend itself. But this isn’t the company’s first legal skirmish over its training data practices.

In August 2024, authors filed a class-action lawsuit against Anthropic for using copyrighted works without consent. And in October 2023, Universal Music Group led another lawsuit, accusing the company of reproducing copyrighted song lyrics.

However, unlike those cases, Reddit’s lawsuit isn’t about copyright infringement. Instead, it focuses on breach of contract and unfair business practices. That legal angle could become a new battleground for data-hosting platforms looking to regulate AI use of public content.

After the lawsuit went public, Reddit’s stock soared nearly 67%. Investors appear to be backing the company’s effort to assert control over its content ecosystem.

Why It Matters

This case marks a pivotal moment in the AI-vs-platform data tug-of-war. If Reddit succeeds, it could empower other platforms—like forums, social networks, and publishers—to restrict how AI companies harvest user content.

That shift would force AI developers to rethink their training pipelines. More firms may need to rely on licensed data or pay to access high-quality sources.

It also raises deeper ethical concerns. Should deleted or personal content be scraped for machine learning, even if it was once publicly visible? Reddit argues the answer is a firm no.

As AI models grow larger and more data-hungry, this lawsuit could reshape the rules of engagement between web platforms and AI developers.

💡 Expert Insight

“Unlike previous copyright lawsuits, Reddit’s approach—focused on breach of service and data ethics—may set new legal standards,” said Thomas Monteiro, senior analyst at Investing.com. “It could be a turning point in how AI firms justify their data collection methods.”

Anthropic, for its part, responded via spokesperson: “We disagree with Reddit’s allegations and intend to defend our position in court.”

GazeOn’s Take

This lawsuit isn’t just a legal drama—it’s a policy test.

AI companies have long leaned on the ambiguity of public content use. If the court rules in Reddit’s favor, AI training will likely become more expensive and regulated. This could spur more transparent licensing models and tighter data boundaries across the internet.

The case may also motivate lawmakers to revisit AI data usage laws, especially regarding user privacy and platform control.

💬 Reader Question

Should platforms like Reddit have the power to block AI from using public content? Or is that just slowing innovation? Let us know what you think.

(Photo by Brett Jordan)

About Author:

Eli Grid is a technology journalist covering the intersection of artificial intelligence, policy, and innovation. With a background in computational linguistics and over a decade of experience reporting on AI research and global tech strategy, Eli is known for his investigative features and clear, data-informed analysis. His reporting bridges the gap between technical breakthroughs and their real-world implications bringing readers timely, insightful stories from the front lines of the AI revolution. Eli’s work has been featured in leading tech outlets and cited by academic and policy institutions worldwide.