This was determined by an investigation by “The Atlantic” magazine. Texts by Stephen King, Zadie Smith and Haruki Murakami, among others, were used.
Works by writers such as Stephen King, Rachel Cusk, Zadie Smith, Margaret Atwood, Haruki Murakami and Elena Ferrante are among thousands of pirated titles used to train artificial intelligence (AI) systems, according to an article by the American magazine . Atlanticquoted by the agency telam.
The publication details that more than 170,000 stocks have been incorporated into models run by companies such as Meta and Bloomberg, which have inadvertently contributed to a huge dataset used by companies to create their AI tools.
According to the text, the Books3 library was used to train LLaMA, one of several large language models – although the best known is OpenAI’s ChatGPT – designed to generate content based on patterns identified in sample texts. .
The dataset was also used to train Bloomberg’s BloombergGPT and EleutherAI’s GPT-J, and was “likely” used in other AI models, according to a study by Atlantic.
The titles grouped in Books3 are roughly one-third fiction and two-thirds non-fiction, with most published within the last 20 years. Besides the writings of Smith, King, Cusk and Ferrante, the copyrighted works in the dataset include 33 books by Margaret Atwood, at least nine by Haruki Murakami, nine by Bell Hooks, seven by Jonathan Franzen , five from Jennifer Egan and five. by David Grann. There are also free books by George Saunders, Junot Diaz, Michael Pollan, Rebecca Solnit, and Jon Krakauer, as well as 102 pulp novels by Church of Scientology founder L. Ron Hubbard and 90 books by Pastor John MacArthur.
Volumes come from small and large publishers, including more than 30,000 published by Penguin Random House, 14,000 by HarperCollins, 7,000 by Macmillan, 1,800 by Oxford University Press and 600 by Verso.
The discovery follows a lawsuit filed in July by three writers – Sarah Silverman, Richard Kadrey and Christopher Golden – alleging that their copyrighted works “were copied and ingested as part of training” for the LLaMA de Meta. The analysis revealed that the pleadings of the three plaintiffs are indeed part of the Books3.
“Incurable alcohol evangelist. Unapologetic pop culture scholar. Subtly charming webaholic.”