Legal issues around generative AI technologies continue to take shape as OpenAI pushes back at the New York Times’ recently filed copyright infringement lawsuit while the European Commission (EC) suggests that Microsoft’s $10 billion investment may require further investigation under the European Union’s merger rules.

OpenAI’s launch of the ChatGPT chatbot in late November 2022 touched off rapid innovation and adoption of generative AI tools, but it also has sparked debates about the data used to train the large-language models (LLMs) and competition in the fast-developing market.

ChatGPT and similar models, such as Google’s Bard, train on massive amounts of publicly available data, including published – and copyrighted – material, including books, magazine articles and music. OpenAI has already been sued multiple times by various artists – including authors like John Grisham and George RR Martin – accusing the company of stealing their works and adding that such training infringes on their copyrighted material and they deserve fair compensation.

Other companies face similar lawsuits.

OpenAI: Lawsuit is ‘Without Merit’

The New York Times in December added its name to the growing list of plaintiffs, filing a lawsuit saying OpenAI and Microsoft violated its intellectual property rights by allowing ChatGPT to train on “millions” of its news articles. The news organization is pushing for the two companies to pay billions of dollars in damages.

In a blog post this week, OpenAI executives argued that the Times’ lawsuit “is without merit,” arguing that training AI models using publicly available material is fair use and adding that they “view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”

The principle is supported by a broad range of academics, library associations, civil society groups, U.S. companies, and other groups and is permitted in other countries, including Israel, Japan, Singapore, and the EU, they wrote.

OpenAI added that it also provides publishers the ability to opt out of the process and prevent the AI company’s tools from accessing their sites, a process the Times adopted in August 2023, according to OpenAI.

Questions Over ‘Regurgitation’

The executives also noted that “regurgitation” – essentially memorization of published material – is a flaw in the generative AI training process that OpenAI is working to fix.

OpenAI also questioned how truthful The New York Times is being, saying executives felt good about negotiations they were having with the new publisher through at least December 19 and that the lawsuit that was filed eight days later came as a surprise.

They also suggested the Times intentionally manipulated prompts to produce regurgitated text in ChatGPT outputs.

“Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” OpenAI executives wrote, adding that despite the lawsuit, they are hoping to strike up a partnership with the Times, similar to ones they said they have with the likes of the Associated Press and the American Journalism Project.

ChatGPT Needs Copyrighted Material

At the same time, OpenAI executives in a presentation last month to a select committee of the UK’s House of Lords agreed that its LLMs use copyrighted material for training and added that the use is legally permitted.

They also argued that training generative AI LLMs couldn’t be done without it.

“Because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials,” they wrote. “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

The House of Lords is among a growing list of government entities taking a look at the issue of AI training sets and copyrights. The U.S. Federal Trade Commission in November 2023 said it has a role to play in the copyright issue and the U.S. Copyright Office is studying possible regulations and policies that will be needed as innovation around AI accelerates.

EC Turns an Eye to Microsoft and OpenAI

Meanwhile, EC officials said this week that they may review the relationship between OpenAI and Microsoft, the AI company’s top investor that uses OpenAI technologies in its own products.

The possibility was mentioned in a note from the EU that included a request for comments from individuals and organizations about the level of competition in the AI and virtual world spaces and how laws can help keep these markets competitive.

EC members added that they are reviewing some agreements between large tech players and generative AI developers and providers and how such partnership effect the market. That may include looking at Microsoft’s massive investment in OpenAI.

“We are inviting businesses and experts to tell us about any competition issues that they may perceive in these industries, whilst also closely monitoring AI partnerships to ensure they do not unduly distort market dynamics,” Margrethe Vestager, the EC’s executive vice president in charge of competition policy, said in a statement.