The New York Times files copyright lawsuit against OpenAI and Microsoft

midian182 · Dec 28, 2023

What just happened? The ongoing controversy over potential copyright infringements related to large language models' training data has taken a significant turn. The New York Times has sued OpenAI and Microsoft for using millions of its articles to train their systems without permission or compensation.

It's no secret that LLMs use swaths of information from the internet as training data, but the NYT claims in its copyright infringement lawsuit that its content has been given "particular emphasis." The suit, filed in Manhattan federal court, claims that the companies "seek to free-ride on the Times's massive investment in its journalism by using it to build substitutive products without permission or payment."

The suit states that the millions of the Times' copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more were used to train the chatbots, which now compete with the news outlet as a source of information.

The lawsuit also highlights information provided by Bing that misidentified the publication's content. It included "the 15 most heart-healthy foods," twelve of which had not been mentioned in the Times story. Another claim is that the content generated is verbatim excerpts from NYT articles, meaning the publication is losing viewers and paying customers to the likes of ChatGPT.

The suit says the defendants should be held responsible for "billions of dollars in statutory and actual damages." It also requests that the companies destroy any chatbot models and training data that use copyrighted material from The Times. OpenAI believes its use of NYT content falls under "fair use" because it serves a new "transformative" purpose.

The suit also spends a good bit of time showing how its content is found in public datasets, such as WebText2, and is also weighted heavily there because of its perceived quality. pic.twitter.com/fO8iE8yAtN
– MatthewBerman (@MatthewBerman) December 28, 2023

It was reported back in August that the Times had been in "tense negotiations" over reaching a licensing deal with OpenAI and Microsoft that would allow the former to legally train its GPT model off of material published by the Times, something the newspaper previously decided to prohibit. But the talks broke down, leading to the current lawsuit. OpenAI already has an agreement in place with Reuters to use its content for training purposes.

Data scraping has made numerous headlines this year. Elon Musk threatened to sue Microsoft in April over a claim that it was illegally using Twitter (as it still was then) data to train AI models. In April, more than 8,000 authors including luminaries such as James Patterson, Margaret Atwood, and Jonathan Franzen signed an open letter asking leaders from the top six AI companies to not use their work for training models without first obtaining consent and offering compensation. Despite this plea, OpenAI has been sued by authors on several occasions for copyright infringement.

In a separate but similar lawsuit, artists launched a copyright lawsuit against AI art generators Stable Diffusion and Midjourney in January.

Permalink to story.

https://www.techspot.com/news/101345-new-york-times-files-copyright-lawsuit-against-openai.html

VaRmeNsI · Dec 28, 2023

A dinosaur lashing out against its inevitable replacement.

nnguy2 · Dec 28, 2023

VaRmeNsI said:
A dinosaur lashing out against its inevitable replacement.

I don't recall AI ever being capable of reporting the news.

Kotters · Dec 28, 2023

nnguy2 said:
I don't recall AI ever being capable of reporting the news.

There are a lot of cryptobros latching onto AI mysticism as the next grift.

toooooot · Dec 28, 2023

My understanding is that it is fairly easy to make an app or apps that will suck every piece of writing, book, article that is available on the internet.
Would someone who did not care about copyright or any laws would do it already if they had that opportunity?
I do not doubt it.
China would be a perfect candidate to get food for its AI without any considerations for international laws.
But if they have done it or in the process, what is the point not doing the same?

PanGrns · Dec 28, 2023

Good news!

MasterAce · Dec 28, 2023

So if one asks around on reddit, and some stranger cites something he read on NYT, thats ok. But if it’s an artificial entity, that’s bad?

It’s a sketchy issue. Is free information only free for humans? Or is it because humans have targeted adds profile to pass as currency?

nnguy2 · Dec 28, 2023

MasterAce said:
So if one asks around on reddit, and some stranger cites something he read on NYT, thats ok. But if it’s an artificial entity, that’s bad?

It’s a sketchy issue. Is free information only free for humans? Or is it because humans have targeted adds profile to pass as currency?

The problem is that person isn't making money of a cited NYT article. Chatgpt, not only was trained, but is now making money on passing off info from a NYT article as it's own work.

redgarl · Dec 28, 2023

Welcome to the new world... copyrights are dead... and it is a good things.

MasterAce · Dec 29, 2023

nnguy2 said:
The problem is that person isn't making money of a cited NYT article. Chatgpt, not only was trained, but is now making money on passing off info from a NYT article as it's own work.

How does it make money though? Corporate customers might pay for the service, but do they use more specific dataset for their specific use cases…

nnguy2 · Dec 29, 2023

MasterAce said:
How does it make money though? Corporate customers might pay for the service, but do they use more specific dataset for their specific use cases…

$20/mon sub for individuals.

Dimitrios · Dec 29, 2023

For some reason after reading that title, the first thing that popped into my mind was a creepy episode from THE TWILIGHT ZONE involving a newspaper company.

GoldenGoat · Dec 29, 2023

AI programmers need to make sure they are not producing word for word articles. That would be a copyright violation. But publishers are hoping that AI will be a new stream a revenue for them so they are doing what they can to make that happen. Training an AI on copyrighted content is no different than a human reading an article and using that information later. Perhaps the AI should site sources for an article, but there is no copyright violation if you read an article and then write your own article about it. That is not a copyright violation if a human does it, so it should not be a copyright violation if a computer does it.

The New York Times files copyright lawsuit against OpenAI and Microsoft

midian182

Posts: 9,745 +121

VaRmeNsI

Posts: 1,318 +1,962

nnguy2

Posts: 951 +2,301

Kotters

Posts: 503 +414

toooooot

Posts: 2,821 +1,597

PanGrns

Posts: 34 +22

MasterAce

Posts: 50 +45

nnguy2

Posts: 951 +2,301

redgarl

Posts: 565 +930

MasterAce

Posts: 50 +45

nnguy2

Posts: 951 +2,301

Dimitrios

Posts: 1,246 +1,025

GoldenGoat

Posts: 239 +270

Similar threads

Latest posts