AI News

The New York Times filed a lawsuit against tech giants OpenAI and Microsoft, alleging unauthorized utilization of its articles for training generative artificial intelligence (Gen AI) systems.

The lawsuit, filed in the Federal District Court in Manhattan, and first reported on by The Wall Street Journal, doesn’t explicitly specify a financial claim but aims for substantial damages in the “billions.”

The crux of The Times’ grievance revolves around the alleged use of millions of its articles, purportedly employed to develop AI chatbots that could be seen as competitive, thereby infringing on The Times’ copyright.

“Times journalism is the work of thousands of journalists, whose employment costs hundreds of millions of dollars per year,” the Times said in its complaint. “Defendants have effectively avoided spending the billions of dollars that The Times invested in creating that work by taking it without permission or compensation.”

The complaint argues Microsoft and OpenAI’s use of The Times’ content in creating competitive AI tools poses a threat to newspaper’s ability to deliver its services.

It notes tools employed by the companies rely on large-language models (LLMs), which were crafted by copying millions of copyrighted news articles, including opinion pieces, reviews and investigations.

This copying, with a particular emphasis on Times content, was apparent when building these models, indicating the value placed on such works. The Times suit argues that through platforms like Microsoft’s Bing Chat (now “Copilot”) and OpenAI’s ChatGPT, the companies aim to capitalize on The Times’ journalism without authorization or compensation.

The lawsuit also references the Constitution and Copyright Act, which acknowledge the role of granting creators exclusive rights to their works. Copyright protection has historically empowered news gatherers, like The Times, to safeguard their substantial investments and efforts in producing original journalism, including millions of articles covered by registered copyrights.

The lawsuit also notes the Times protested the unauthorized use of its content by Microsoft in developing their models and tools and sought to make a deal with the two companies in April. 

“The Times’s goal during these negotiations was to ensure it received fair value for the use of its content, facilitate the continuation of a healthy news ecosystem, and help develop GenAI technology in a responsible way that benefits society and supports a well-informed public,” the company said in the complaint.

Tech firms building AI tools often claim the legal principle of “fair use”, which permits the use of copyrighted material from the open internet, but The Times contends that this provision doesn’t apply as AI tools can replicate substantial portions of its news articles.

“Using the valuable intellectual property of others in these ways without paying for it has been extremely lucrative for Defendants,” the complaint stated. “Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone.”

OpenAI, a company with an estimated valuation exceeding $80 billion, and Microsoft, a major investor with a $13 billion stake in OpenAI, have yet to respond officially to the lawsuit.

The dispute has broad implications, with the potential to reshape the legal boundaries surrounding AI technology and its use in industries across news and media. Crucial precedents could also be set regarding the safeguarding intellectual property within the evolving landscape of AI technology, potentially reshaping how AI companies conduct their operations.

Joseph Thacker, principal AI engineer and security researcher at AppOmni, says the biggest effect is that the lawsuit could set the precedent for legal boundaries around using copyrighted materials in AI development, and likely lead to stricter regulations and guidelines.

“This could lead to a shift in the tech industry, with companies becoming more cautious in their use of copyrighted materials–maybe even being more cautious of using OpenAI’s services,” he says. “It may also slow down AI advancements but could also force some innovation in data sourcing.”

John Bambenek, president at Bambenek Consulting, points out AI requires lots of data to work and the rights of data creators starts to become supremely important.

“OpenAI is looking at a potential valuation of $100 billion and not a single dollar of that is realized without source data to train on and there just isn’t much data in the open-source world to make models that are effective,” he says. 

From his perspective, ultimately the parameters of what “fair use” means is a question of law that he is not sure has applicable precedent.

“That said, case law favoring some of the aggressive enforcement by the MPAA and RIAA suggests that if [the Times] convinces the courts this is a ‘form of copying’, they stand a good chance,” he says.

Bambenek adds the problem is that judges are not technologists, so they’ll be struggling to understand the technology, but they do understand what a “free lunch” is.

“If OpenAI has the valuation they do, why shouldn’t they pay licensing to the content creators that are building their models?” he asks. 

Thacker adds tech firms may struggle to prove that their use of copyrighted materials for AI training falls under fair use if they are currently using something like Common Crawl– a large-scale web crawling project that aims to provide a free, open dataset of web pages and metadata to the public.

“AI model creators may need to demonstrate that their AI tools do not replicate substantial portions of the original content, which is pretty difficult without making them less useful,” he says.