Another group of award-winning authors is suing OpenAI over its widely popular ChatGPT chatbot, claiming the company illegally used their copyrighted works to train its generative AI tool.
The lawsuit brought earlier this month by authors Michael Chabon, David Henry Hwang, Matthew Klam, Rachel Louise Snyder, and Ayelet Waldman is the latest legal challenge to OpenAI and ChatGPT, which uses the internet and other sources as the massive dataset on which the large-language model (LLM) is trained.
It follows a similar suit filed in July by actress, comedian and author Sarah Silverman and authors Christopher Golden and Richard Kadrey against OpenAI, and another one by the three against Meta and its Llama AI model over what they say is the illegal use of copyrighted material in its dataset.
Legal action from other corners also include one filed this month by a San Francisco law firm representing two unnamed engineers claiming OpenAI’s Microsoft-backed products – not only ChatGPT but also Dall-E (text-to-image models) and Vall-E (text-to-speech model) – “use stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge.”
The lawsuits so far have been filed in federal court in the Northern District of California.
There also are reports that the New York Times also is considering filing a lawsuit against OpenAI to protect the intellectual property stemming from its reporting and editing efforts.
The Battle Over Copyrights
The growing list of lawsuits and the larger concern over how products like ChatGPT, Llama, and Google’s Bard are trained could pose a significant challenge against generative AI tools that have been rapidly adopted by organizations and individuals since ChatGPT’s release in late November 2022.
Microsoft has invested more than $10 billion in OpenAI in recent years and has championed the use of ChatGPT and other generative AI technologies, expanding their use throughout its entire product portfolio.
OpenAI has pushed back, claiming in motions to dismiss the lawsuits headed by Silverman that their arguments “misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
Rob Enderle, principal analyst with The Enderle Group, told Techstrong.ai that these lawsuits “could be the first serious effort to define the rights of AIs vs. humans, and rulings could impact both how AIs are trained and how authors are compensated by those that learn from them.”
“Given we largely learn from studying the work of others, similar but far slower and less diverse than AIs, the obvious defense is that the AI isn’t doing anything different than humans do, and because the AI pulls from a vastly larger pool, AIs are less likely to be connected to any single author than a human is,” Enderle said.
He also noted that humans already are pushing back against AI in other arenas, pointing out that the ongoing strike by the actors and writers with Screen Actors Guild and the Writers Guild of America is in part driven by the desire to protect their jobs against any encroachment by AI, which would be trained by those they replace.
“This could redefine protections against identity theft, copyright infringement, and plagiarism, with impacts that could extend well beyond AIs alone,” the analyst said.
The Use of Works in Generative AI
In the latest court filing, Chabon, Hwang, Snyder and Waldman argue that OpenAI uses their works to train the LLM underlying ChatGPT, which infringes on the copyrights. The AI startup hasn’t named the works that ChatGPT trains on, but says they are part of the Books1 and Books2 datasets used for training. The plaintiffs know this because ChatGPT, when prompted, can summarize and analyze their works’ contents or can write text in the style of the authors.
They pointed to numerous examples using Chabon’s novel “The Amazing Adventures of Kavalier & Clay,” Hwang’s play “The Dance and the Railroad,” Klam’s novel “Who is Rich?,” Snyder’s book “What We’ve Lost is Nothing” and Waldman’s novel “Love and Other Impossible Pursuits.”
Outputs based on these and other works are derivates of the works that infringe on the copyrights, and the authors argue that they “never authorized OpenAI make copies of their written works, make derivative works, publicly display copies (or derivative works), or distribute copies (or derivate works) … Open AI has and will continue to infringe” on the copyrights.
They also claim that OpenAI “knew at all relevant times that the datasets it used to train its GPT models contained copyrighted materials” and that its actions violated the terms for using the materials.
Permission and Compensation
Writers have been pushing back against tech firms and their use of such materials in their AI model training. More than 10,000 authors and supporters signed a letter sent to OpenAI, IBM, Microsoft, Meta, Google parent Alphabet and Stability AI in July by The Authors Guild – a professional organizations for writers in the United States – about the use of copyrighted works being used to train LLMs without the authors being compensated.
“Where AI companies like to say that their machines simply ‘read’ the texts that they are trained on, this is inaccurate anthropomorphizing,” the letter said. “Rather, they copy the texts into the software itself, and then they reproduce them again and again.”
The letter urges the tech companies to get permission to use the works in generated AI datasets and pay the writers for both the use of their works in generative AI programs as well as their outputs.