The New York Times files copyright lawsuit against OpenAI and Microsoft

midian182

Posts: 9,745   +121
Staff member
What just happened? The ongoing controversy over potential copyright infringements related to large language models' training data has taken a significant turn. The New York Times has sued OpenAI and Microsoft for using millions of its articles to train their systems without permission or compensation.

It's no secret that LLMs use swaths of information from the internet as training data, but the NYT claims in its copyright infringement lawsuit that its content has been given "particular emphasis." The suit, filed in Manhattan federal court, claims that the companies "seek to free-ride on the Times's massive investment in its journalism by using it to build substitutive products without permission or payment."

The suit states that the millions of the Times' copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more were used to train the chatbots, which now compete with the news outlet as a source of information.

The lawsuit also highlights information provided by Bing that misidentified the publication's content. It included "the 15 most heart-healthy foods," twelve of which had not been mentioned in the Times story. Another claim is that the content generated is verbatim excerpts from NYT articles, meaning the publication is losing viewers and paying customers to the likes of ChatGPT.

The suit says the defendants should be held responsible for "billions of dollars in statutory and actual damages." It also requests that the companies destroy any chatbot models and training data that use copyrighted material from The Times. OpenAI believes its use of NYT content falls under "fair use" because it serves a new "transformative" purpose.

It was reported back in August that the Times had been in "tense negotiations" over reaching a licensing deal with OpenAI and Microsoft that would allow the former to legally train its GPT model off of material published by the Times, something the newspaper previously decided to prohibit. But the talks broke down, leading to the current lawsuit. OpenAI already has an agreement in place with Reuters to use its content for training purposes.

Data scraping has made numerous headlines this year. Elon Musk threatened to sue Microsoft in April over a claim that it was illegally using Twitter (as it still was then) data to train AI models. In April, more than 8,000 authors including luminaries such as James Patterson, Margaret Atwood, and Jonathan Franzen signed an open letter asking leaders from the top six AI companies to not use their work for training models without first obtaining consent and offering compensation. Despite this plea, OpenAI has been sued by authors on several occasions for copyright infringement.

In a separate but similar lawsuit, artists launched a copyright lawsuit against AI art generators Stable Diffusion and Midjourney in January.

Permalink to story.

 
My understanding is that it is fairly easy to make an app or apps that will suck every piece of writing, book, article that is available on the internet.
Would someone who did not care about copyright or any laws would do it already if they had that opportunity?
I do not doubt it.
China would be a perfect candidate to get food for its AI without any considerations for international laws.
But if they have done it or in the process, what is the point not doing the same?
 
So if one asks around on reddit, and some stranger cites something he read on NYT, thats ok. But if it’s an artificial entity, that’s bad?

It’s a sketchy issue. Is free information only free for humans? Or is it because humans have targeted adds profile to pass as currency?
 
So if one asks around on reddit, and some stranger cites something he read on NYT, thats ok. But if it’s an artificial entity, that’s bad?

It’s a sketchy issue. Is free information only free for humans? Or is it because humans have targeted adds profile to pass as currency?
The problem is that person isn't making money of a cited NYT article. Chatgpt, not only was trained, but is now making money on passing off info from a NYT article as it's own work.
 
The problem is that person isn't making money of a cited NYT article. Chatgpt, not only was trained, but is now making money on passing off info from a NYT article as it's own work.
How does it make money though? Corporate customers might pay for the service, but do they use more specific dataset for their specific use cases…
 
For some reason after reading that title, the first thing that popped into my mind was a creepy episode from THE TWILIGHT ZONE involving a newspaper company.
 
AI programmers need to make sure they are not producing word for word articles. That would be a copyright violation. But publishers are hoping that AI will be a new stream a revenue for them so they are doing what they can to make that happen. Training an AI on copyrighted content is no different than a human reading an article and using that information later. Perhaps the AI should site sources for an article, but there is no copyright violation if you read an article and then write your own article about it. That is not a copyright violation if a human does it, so it should not be a copyright violation if a computer does it.
 
Back