Reddit has launched a lawsuit against artificial intelligence firm Anthropic, accusing the company of scraping its platform and using Reddit content without permission to train its Claude AI model.
The complaint, filed Wednesday in a U.S. federal court, alleges that Anthropic violated Reddit’s user agreement and continued to access Reddit servers, including doing so more than 100,000 times after publicly claiming to have ceased such activity in July 2024.
Reddit is seeking damages, restitution, and a court order barring Anthropic from using any Reddit-derived data in its products, including preventing the company from licensing or profiting off any AI programs trained on Reddit content.
Decrypt has contacted Anthropic for a response to Reddit’s claims.
The social media giant claimed there were “two faces” to the AI company, which has tried to position itself as the responsible player in the AI industry.
“[There’s] the public face that attempts to ingratiate itself into the consumer’s consciousness with claims of righteousness and respect for boundaries and the law, and the private face that ignores any rules that interfere with its attempts to further line its pockets,” the lawsuit reads.
At the heart of the dispute is a broader controversy surrounding how large language models are trained. Since the debut of OpenAI’s ChatGPT, concerns have escalated over the use of both copyrighted and user-generated materials in AI development.
Ongoing issues
Several lawsuits have already been filed by organizations, including a high-profile case brought by The New York Times against OpenAI and Microsoft in 2023. Other plaintiffs include visual artists, authors, and record labels who argue their work was exploited without permission.
Anthropic is also facing another lawsuit regarding its alleged use of copyrighted song lyrics, as well as yet another from a group of authors who said the company used pirated versions of their books as training materials.
The tension has spilled into the cultural arena, with artists expressing outrage over AI-generated imitations of their styles.
Earlier this year, a craze for replicating the art style of the popular Japanese animation company Studio Ghibli sparked concerns about copyright violations and artists losing out to AI programs trained on their own work.
In a submission to the UK Parliament last year, OpenAI acknowledged using copyrighted content in training, arguing it would be "impossible" to develop leading AI systems without it. The company maintains that such practices are lawful.
A proposal last month in the UK to ease copyright law and allow the use of copyrighted materials for training LLMs has come under fire from prominent artists, including Elton John.
Despite its protestations about protecting its users, Reddit itself, however, sees little wrong with using user content for LLM training, as long as Reddit is compensated.
It has struck its own licensing deals with firms like OpenAI, Google, Sprinklr, and Cision to allow access to its content for training purposes.
Edited by Sebastian Sinclair
Your Email