A group of authors recently filed separate lawsuits in the San Francisco federal court against OpenAI and Meta, accusing them of copyright infringement. The lawsuits claim that the companies utilized the authors' content without permission to train their Artificial Intelligence (AI) models.
Stand-up comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey, assert indeed that OpenAI and Meta trained their respective AI language models using unlawfully obtained datasets that contained the authors' works.
According to the complaints, ChatGPT and Meta's LLaMA (competitor to ChatGPT introduced by Meta Platforms owned by Mark Zuckerberg) refined their capabilities by utilizing "shadow library" websites such as Bibliotik, Library Genesis (LibGen), and Z-Library, among others. These websites are deemed illegal since the majority of the content uploaded there is protected by authors' intellectual property rights.
Included as exhibits (Download Silverman-openai-complaint-exhibits) in the lawsuit are examples demonstrating ChatGPT's responses when asked to summarize books authored by Silverman (The Bedwetter), Golden (Ararat), and Kadrey (Sandman Slim).
The suit (Download ClassAction_OpenAI_complaint) argues that ChatGPT's synopses of the titles fail to reproduce any of the copyright information provided by the Plaintiffs in their published works, despite generating highly accurate summaries.
As a result, the lawsuit contends that ChatGPT retains knowledge of specific works within the training dataset, allowing it to produce similar textual content. The authors' legal action against Meta (Download ClassAction_MetaComplaint) also highlights the alleged usage of illicit websites to train LLaMA.
One dataset employed by LLaMA to enhance its capabilities is called The Pile and it is assembled by the nonprofit AI research organization EleutherAI.
The lawsuit filed by Silverman, Golden, and Kadrey highlights a publication from EleutherAI entitled "The Pile: An 800GB Dataset of Diverse Text for Language Modeling" (Download ThePile_EleutherAI), which describes how one of its datasets, called Books3, was created using the contents of the Bibliotik private tracker, one of the illegal "shadow libraries" mentioned in the lawsuit.
According to the authors, they did not provide consent for their copyrighted books to be used as training material for either of the AI models. As a consequence, they argue that OpenAI and Meta violated six counts of copyright laws, including negligence, unjust enrichment, and unfair competition.
While the lawsuit acknowledges that the damage caused "cannot be fully compensated or measured in money," the Plaintiffs are seeking statutory damages, restitution of profits, and more.
This case marks the first lawsuit against ChatGPT involving copyright infringement, and it raises questions about the boundaries of legality within the field of generative AI.
The legal representatives for the authors, Joseph Saveri and Matthew Butterick, are involved in multiple lawsuits related to authors and AI models, as indicated on their LLMlitigation website.
In 2022, they filed a lawsuit against OpenAI's GitHub Copilot (that turns natural language into code and was acquired by Microsoft for $7.5 billion in 2018), alleging privacy violations, unjust enrichment, unfair competition, and fraud, among other claims.
They also previously filed a complaint challenging the AI image generator Stable Diffusion (Getty Images, the stock photo company, is also pursuing legal action against it alleging a breach of copyright) and represent two additional US authors, Mona Awad (author of Bunny and 13 Ways of Looking at a Fat Girl) and Paul Tremblay (The Cabin at the End of the World) in a separate class action lawsuit against OpenAI (Awad and Tremblay also claim that ChatGPT was trained using their literary works without their consent - Download Awad_Tremblay-openai-complaint). Additionally, Saveri and Butterick are representing three artists - Sarah Andersen, Kelly McKernan, and Karla Ortiz - in a lawsuit against image generators Stability AI, DeviantArt, and Midjourney for their use of collage tool Stable Diffusion.
Proving financial losses specifically caused by ChatGPT's use of copyrighted material may be challenging even if it were true, as ChatGPT could function similarly even without ingesting the books, as it is trained on a wide range of internet information, including discussions among users about the books.
That said, OpenAI retains an ambiguous position, with increasing secrecy regarding training data (but also calling for regulations, stating that AI will lead to the destruction of the world, but then threatening to leave Europe when they don't like the regulations the European Union may implement when it comes to AI). Previous documents related to early versions of ChatGPT mentioned the "internet-based books corpora" known as "Books2," which contained approximately 294,000 titles.
AI models are trained through the utilization of vast datasets and algorithms and books are considered extremely valuable for training language models due to their well-edited, long-form prose. Books guarantee the systems will receive strong inputs and will therefore be able to offer excellent performances, so they are an ideal source of data.
The above mentioned lawsuits may be tricky for the courts involved: the outcome of the cases may indeed depend on whether courts consider the use of copyrighted material in this manner as "fair use" or as unauthorized copying (but there are countries which do not have the "fair use" defense and in these cases outcomes would be different…).
These lawsuits pose not only a headache for OpenAI and other AI companies but also serve as precursors to the forthcoming wave of diverse legal actions in the future.
In Europe the debate around the AI Act continues. Spain has assumed the rotating presidency of the EU Council of Ministers and has outlined its priorities for digital issues and reaching a political agreement on the AI Act. The Spanish presidency has circulated a document that outlines its position on key aspects of the Act, including the definition of AI, classification of high-risk applications, and impact assessment on fundamental rights.
Additionally, Belgium plans to advocate for the establishment of an agency with technical expertise in algorithms within the EU during its presidency next year, aiming to upgrade the European Centre for Algorithmic Transparency (ECAT) in Seville. Stakeholders, such as the Future of Life Institute and the German AI Association, have published their positions on the AI Act, offering recommendations and concerns. Some businesses, including Siemens and Heineken, have expressed criticism of the proposed AI Act, while Google is reportedly in talks with EU regulators to address concerns and develop tools related to AI.
So far, though, the debates are concentrated on the infringement of copyright, but not on the rights of the materials generated by AI tools. And this is actually a point that may interest fashion houses and designers, but also AI artists.
Let's ponder a bit on this case: Dolce & Gabbana recently showcased their Alta Sartoria menswear collection in Puglia. Apart from featuring designs that looked like crossovers between the contents of your granny/aunt's secret trunk of home linens and bedsheets, the assorted vestments found in Vatican sacristy, altar boy attires, altar linens and luxury loungewear (View this photo or View this photo, ideal designs if you want to recreate the runway show in Federico Fellini's film Roma), the catwalk closed with a series of bustiers inspired by sculptural and architectural elements.
Bustiers of this kind have actually been trending on the Instagram pages of AI digital artists à la @rickdick__ who in the past few months developed a variety of men's bustiers that went from trending sculptural designs inspired by Ming ceramics to lighter lace-like pieces.
In a video, D&G's atelier shows how the sculptural bodices were created using 3D printing, an innovative technique that perfectly aligns with new technologies and that may be perfect to turn a fantasy design created with Midjourney into a real life creation.
Usually AI artists are the ones blamed for generating via their prompts images collaged or heavily inspired by other people's works. But in this case the question is, did D&G actually take inspiration from these images or did AI remix D&G's breastplates from previous collections such as the Alta Sartoria one showcased last year in Siracusa (mind you, in that case the bustiers were mainly bejewelled or sculptural but flat, View this photo; in this collection instead the bodices are sculptural and three dimensional).
The breastplates and chest pieces in this collection look indeed more similar to the ones produced by AI artists with Midjourney.
There is no way to prove that D&G lifted the idea from an Instagram page, but, if they did, who owns the copyright? If there is no protection of the prompt and no protection of the images generated (a few months ago, the U.S. Copyright Office ("USCO") only recognized partial authorship to the comic book author who had used Midjourney for her graphic novel), anybody can steal an idea generated by an AI artist, transform it into a garment and include it in their collection.
If this is the case, D&G are extremely clever: getting inspired by religion has saved them multiple copyright infringement cases (in most cases there are no specific copyrights on sacred images and symbols from the Catholic religion, nor on the garments donned by the statues of Our Lady of Sorrows that D&G keep on recreating).
If fashion designers start copying ideas from AI artists whose work is not acknowledged as original as there are no formal laws protecting the prompt and the art generated by the prompt, they may not be subject to any case of copyright infringement and they may be able to come up with new designs that they didn't think nor bothered to learn how to generate via Midjourney, Dall-E or Stable Diffusion, but that in turn may bear resemblance to something else that was previously created.
In a nutshell, those who copy the AI artists who may have generated a cool picture may not be guilty of any copyright infringement, while AI artists will still be seen as creating images assembled by a system from a database containing images dubiously acquired. That's food for thought for legal teams all over the world, but also for the legislative bodies currently working on AI laws.
Comments
You can follow this conversation by subscribing to the comment feed for this post.