Welcome to the second issue of the HLK Newsletter tracking the AI legal landscape. In this issue, we discuss the lawsuits brought by authors in the US and update you on discussions regarding AI at the UN and on the international AI Safety Summit to be held in the UK in November.

Generative AI in the Courts: The Authors

In this edition, we'll be summarising the various actions brought by authors in relation to use of their works in Generative AI. There's plenty of them, and they might just be the tip of the iceberg because, as class action lawsuits, if the plaintiffs in these cases are successful, it opens the floodgates for thousands of unknown others to whom the facts may apply.

Let's start with an action in the US brought in June this year against OpenAI by horror and fantasy writer Paul Tremblay, and Mona Awad whose dark comic fiction led Margaret Atwood to refer to her as her 'literary heir apparent'. OpenAI, which is most famous for the ubiquitous ChatGPT and generative image engine DALL·E, has positioned itself as a force for good in AI development, stating that it intends to ensure that artificial general intelligence benefits all of humanity while navigating the associated 'massive risks'.

You can imagine then that their feelings might have been a little hurt by the allegation that ChatGPT infringed copyright by including the authors' works in its training dataset without "consent, credit and without compensation". The authors refer to OpenAI's own disclosure that they trained their model on "long stretches of contiguous text", and used a controversial database, BookCorpus. BookCorpus was itself created from Smashwords, at its heart a self-publishing portal, but the suggestion of the plaintiffs appears to be that copyright works may have snuck their way in. There are further suggestions that ChatGPT may have used "flagrantly illegal shadow libraries". The use of specific copyright works in the training data is shown, say the plaintiffs, by the (mostly) accurate summaries of books produced by ChatGPT.

This was swiftly followed in July with actions against Meta and against Open AI, filed in the US by comedian and author Sarah Silverman along with authors Richard Kadrey, and Christopher Golden. The plaintiffs suggest Meta's AI, Llama was also trained using these dubious shadow libraries, and particularly refer to the Books3 dataset (interestingly now seemingly taken down due to reported copyright infringement), which they say contained the plaintiffs' books. The plaintiffs argue that this makes LLama itself a derivative work which, along with the copying carried out as part of the training processes and the text outputs which rely on information extracted from the training data (which are also characterised as potentially derivative works), amounts to copyright infringement.

Both Meta and OpenAI have filed motions to dismiss in relation to these actions, arguing the inclusion of books in their training data is fair use, and therefore not infringement. Both reference Authors Guild v. Google, a 2015 case, which concluded amongst other things that scanning a whole book to provide a search function was fair use as it was unlikely to prevent a purchase of the book. It is clear that 'fair use' will be a key issue at any trial.

Then, in September, a further blow for OpenAI: the Authors Guild and a bunch of big names in the publishing world, including Jonathan Franzen (whose magnum opus "The Corrections" covers more IP law than you might expect), Jodi Picoult and George RR Martin, filed an action alleging ChatGPT was able to "spit out derivative works: material that is based on, mimics, summarizes, or paraphrases" the authors' work. In what is probably the most histrionic of the filings (one has to feel some empathy for the lawyer tasked with drafting for that readership!), they include an entertaining use of ChatGPT to self-incriminate (see para 87 of the complaint) and reference the recent writers' strike. More pertinently perhaps, they characterise summaries of books as derivative works, as well as the possibility of generating works "impersonating authors".

We will provide further updates in relation to these cases as they progress.

Legislation watch

AI was firmly on the Agenda at the UN General Assembly held in New York between 19-26th September, with the Secretary-General of the UN António Guterres taking a somewhat downbeat view in his address. He called for a unified approach saying "We are inching ever closer to a Great Fracture in economic and financial systems and trade relations; one that threatens a single, open internet; with diverging strategies on technology and artificial intelligence; and potentially clashing security frameworks." He went on to assert that AI posed an emerging threat to peace, and required new governance frameworks. In relation to Generative AI, he cautioned that while it "holds much promise [...] it may also lead us across a Rubicon and into more danger than we can control."

On a more positive note, Secretary-General Guterres also announced the appointment of a High-Level Advisory Body on Artificial Intelligence which will provide recommendations by the end of this year.

At the same UN General Assembly, President Biden noted the "enormous potential and enormous peril" of AI (from around the 9 minutes 15 second mark in this video), and said he has been working with leaders around the world – including competitors of the US- to ensure AI is safe and of benefit. On which note, while the UK government has been forced at times to defend its decision to invite China to at least some of its Bletchley Park summit, other commentary speculates that this may make it easier to have China be part of the conversation than if the US were leading the charge at this point.

And on the subject of Bletchley Park, details are emerging of the draft agenda. We can hope for a communique on the risks of AI models, agreed by the participants, as well as an update on how companies such as OpenAI, Google and Microsoft are getting on in relation to the AI safety measures they agreed with the White House back in July. We hope to know more soon.

AI application of the week

My Neural Network has no nose. How does it smell? Well, better than you... At least assuming it's the Osmo AI's Graph Neural Network which "applies machine learning to quantify, digitise, and engineer scent", creating a map of scent which predicts "the scent of molecules that had never before been smelled." While dimensions like Red, Green and Blue can be used to characterise colour, the smell map is developed based on dimensions including "fruity," "floral," "musky," and "meaty". They hope their technology can be used to find better (e.g. biodegradable) scent molecules, stop food waste by detecting early stages of rot and possibly in the diagnosis of certain medical conditions which can be associated with smell. Other more whimsical use cases involve recreating smells as a window to the past, given the power of scent to evoke memories.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.