OpenAI accuses New York Times of paying someone to "hack" ChatGPT

midian182

Posts: 9,736   +121
Staff member
WTF?! The New York Times' copyright lawsuit against OpenAI has taken an unexpected twist after the tech company accused the newspaper of hiring someone to "hack" ChatGPT and other products to generate misleading evidence supporting its claim. OpenAI's use of the term "hack" may be a stretch, though.

The NYT sued OpenAI and Microsoft in December for using millions of its articles to train their systems without permission or compensation. The suit states that millions of the Times' copyrighted news pieces, in-depth investigations, opinion features, reviews, how-to guides, and more were used to train the chatbots, which now compete with the news outlet as a source of information.

In a filing in Manhattan federal court on Monday, OpenAI alleged that the Times "paid someone to hack" its products to generate 100 examples of copyright infringement.

OpenAI claims that it took the Times tens of thousands of attempts to generate "highly anomalous results" and that it achieved this using "deceptive prompts that blatantly violate OpenAI's terms of use."

"They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI's terms of use," OpenAI's lawyer wrote. "And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites."

"Normal people do not use OpenAI's products in this way [...] In the ordinary course, one cannot use ChatGPT to serve up Times articles at will," OpenAI continued.

OpenAI does not name the "hired gun" who it claims the Times hired to manipulate ChatGPT's output, nor does it accuse the paper of actual hacking. This sounds more like standard prompt engineering, and the Times agrees.

"What OpenAI bizarrely mischaracterizes as 'hacking' is simply using OpenAI's products to look for evidence that they stole and reproduced The Times' copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI's copying is much larger than the 100-plus examples set forth in the complaint," said Ian Crosby, Susman Godfrey partner and lead counsel for the publication. "In this filing, OpenAI doesn't dispute – nor can they – that they copied millions of The Times' works to build and power its commercial products without our permission."

The use of copyrighted work in the training of generative AIs has led to numerous lawsuits from authors, artists, and creators. OpenAI said in the filing it believes AI companies will win cases like these based on fair use. It notes that The Times "cannot prevent AI models from acquiring knowledge about facts."

It was reported back in August that the Times had been in "tense negotiations" over reaching a licensing deal with OpenAI and Microsoft that would allow the former to legally train its GPT model off of material published by the Times, something the newspaper previously decided to prohibit. But the talks broke down, leading to the current lawsuit. OpenAI already has an agreement in place with Reuters and Axel Springer to use their content for training purposes, and is said to be in talks with CNN, Fox Corp., and Time to secure licensing deals.

Permalink to story.

 
" OpenAI claims that it took the Times tens of thousands of attempts to generate "highly anomalous results" and that it achieved this using "deceptive prompts that blatantly violate OpenAI's terms of use." "

Well they always argued that you can't blind trust on ChatGPT outputs... So why now OpenAI is worried?

 
It's an interesting point: if OpenAI can show that the claimed copyright violations were made in bad faith, it would greatly weaken the argument against them. At a certain point the DMCA could potentially be invoked to call that illegal. But whether or not a judge/jury will agree with OpenAI's claims is hard to tell.
 
Frivolous lawsuit, specifically tailored to cover the fact that they are actually using copyrighted material under an obscure technicality, which by the way is also their own fault.

I wonder if they will use the same excuse when AI will be used to enslave us: “it is irrelevant because they exploited a bug in our software”
 
Frivolous lawsuit, specifically tailored to cover the fact that they are actually using copyrighted material under an obscure technicality,
LOL, what? The fair use exclusion is neither obscure nor "a technicality". Your argument is like claiming that, if you read a NYT article, you're not allowed to discuss the factual details of the story with anyone.

I'll refrain from commenting for now on the irony of those who most support pirating copyrighted material cheering for the NYT here.
 
LOL, what? The fair use exclusion is neither obscure nor "a technicality". Your argument is like claiming that, if you read a NYT article, you're not allowed to discuss the factual details of the story with anyone.

I'll refrain from commenting for now on the irony of those who most support pirating copyrighted material cheering for the NYT here.
Fair use is not for making money on material you don’t have a licence for. As simple as that.
 
Fair use is not for making money on material you don’t have a licence for. As simple as that.
You cannot copyright information or ideas; only a particular expression of them. Training a model on copyrighted material is no different than training a writer, artist, or professional on that same material. It's not what goes in that counts-- it's what comes out.

Here, apparently, the NYT was able to spoof the system by feeding it their own copyrighted material, to get it to regurgitate it back at them. If that is indeed what happened, then there is no violation. Case closed. As simple as that.
 
You cannot copyright information or ideas; only a particular expression of them. Training a model on copyrighted material is no different than training a writer, artist, or professional on that same material. It's not what goes in that counts-- it's what comes out.

Here, apparently, the NYT was able to spoof the system by feeding it their own copyrighted material, to get it to regurgitate it back at them. If that is indeed what happened, then there is no violation. Case closed. As simple as that.
If you want training on copyrighted material you need to pay for it, one way or another. And if you use said material in a material of your own you need to have proper citations in place, otherwise it is plagiarism.

If you want access to NYT articles you need to pay a subscription. And even if you do, using any material in there in an “original” “generative” piece, still requires proper citations.

As simple as that.
 
Last edited:
If you want training on copyrighted material you need to pay for it, one way or another
Err, no. 99% of the stories you've read on this particular website -- as many others -- are the result of the author reading someone else's copyrighted story, and then paraphrasing it appropriately. Perfectly legal.

. And if you use said material in a material of your own you need to have proper citations in place, otherwise it is plagiarism.
Heh, you're using terms you don't understand. Plagiarism relates to ethics in an academic setting; it has no legal meaning. The term you mean is infringement. Only individuals can commit plagiarism; they can plagiarize copywritten or non-copywritten material; and all the citations in the world won't add to or stop an infringement case. Organizations -- such as the NYT or OpenAI -- commit infringement.

Or in this case -- they don't.
 
Err, no. 99% of the stories you've read on this particular website -- as many others -- are the result of the author reading someone else's copyrighted story, and then paraphrasing it appropriately. Perfectly legal.


Heh, you're using terms you don't understand. Plagiarism relates to ethics in an academic setting; it has no legal meaning. The term you mean is infringement. Only individuals can commit plagiarism; they can plagiarize copywritten or non-copywritten material; and all the citations in the world won't add to or stop an infringement case. Organizations -- such as the NYT or OpenAI -- commit infringement.

Or in this case -- they don't.
Perfectly legal indeed for THIS site. Open AI’s however are using copyrighted materials behind a paywall, including opinion pieces, how-to guides news analysis and other stuff. NYT opted out, their opt out was ignored and Open AI refuses to pay for licensing. NYT is entitled to sue and will most likely win. As simple as that.

As for my plagiarism reference you’re simply playing semantics, like a lawyer is. You knew the intent and ignored it.
 
Perfectly legal indeed for THIS site. Open AI’s however are using copyrighted materials behind a paywall...
Heh, no. Copyrighted material "behind a paywall" has exactly the same status as any other copywritten material. A NYT paywalled feature vs. a TechSpot free-to-read story: both receive equivalent legal protection. There really isn't any room for debate on this; are you next going to attempt to argue that water isn't wet?
 
Heh, no. Copyrighted material "behind a paywall" has exactly the same status as any other copywritten material. A NYT paywalled feature vs. a TechSpot free-to-read story: both receive equivalent legal protection. There really isn't any room for debate on this; are you next going to attempt to argue that water isn't wet?
You have no idea what are you talking about, but you keep on talking nonetheless. The RIGHT to USE of copyrighted material is in discussion here. Open AI did NOT have it. Period.

You’re arguing in bad faith, for the sake of arguing, it is boring really, find someone else to argue please.
 
You have no idea what are you talking about, but you keep on talking nonetheless. The RIGHT to USE of copyrighted material is in discussion here.
Next time read the story:

"...In a filing in Manhattan federal court on Monday, OpenAI alleged that the Times "paid someone to hack" its products to generate 100 examples of copyright infringement....."

If there was any doubt this is a case of simple copyright infringement, the NYT source article states:

"...The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday,...."


Now, what exactly does 'copyright infringement' mean?

"....The U.S. Copyright Office defines copyright infringement as such: "As a general matter, copyright infringement occurs when a copyrighted work is reproduced, distributed, performed, publicly displayed, or made into a derivative work without the permission of the copyright owner...."


The existence or absence of a paywall here is utterly irrelevant.
 
Next time read the story:

"...In a filing in Manhattan federal court on Monday, OpenAI alleged that the Times "paid someone to hack" its products to generate 100 examples of copyright infringement....."

If there was any doubt this is a case of simple copyright infringement, the NYT source article states:

"...The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday,...."


Now, what exactly does 'copyright infringement' mean?

"....The U.S. Copyright Office defines copyright infringement as such: "As a general matter, copyright infringement occurs when a copyrighted work is reproduced, distributed, performed, publicly displayed, or made into a derivative work without the permission of the copyright owner...."


The existence or absence of a paywall here is utterly irrelevant.
Indeed, thanks for the citation. The right of use is in question here. Open AI used NYT copyrighted work WITHOUT permission.

End of story and thanks for so eloquently making my point.
 
Indeed, thanks for the citation. The right of use is in question here. Open AI used NYT copyrighted work WITHOUT permission.

End of story and thanks for so eloquently making my point.
Copyright infringement would only apply, if the chatbot would regurgitate the articles verbatim when promted, and that's what NYT tried to achieve, when they hired the "hacker".

You are buying the subscription from NYT to read the articles, if feeding it to ChatGPT would be somehow illegal, so it would be to reading it on a computer, as the computer already processed the data coming from the website.
 
Copyright infringement would only apply, if the chatbot would regurgitate the articles verbatim when promted, and that's what NYT tried to achieve, when they hired the "hacker".

You are buying the subscription from NYT to read the articles, if feeding it to ChatGPT would be somehow illegal, so it would be to reading it on a computer, as the computer already processed the data coming from the website.
Open AI is creating derivative work using NYT materials without permission and despite the opting out. Derivative work is covered by copyright law as cited above. What NYT did thru the “hacking” is proving that Open AI ignored the opt out and still used NYT copyrighted material without an agreement. I’m not sure why this is so hard to understand.

Open AI doesn’t even deny it, they are essentially the thief crying thief.
 
Open AI is creating derivative work using NYT materials without permission and despite the opting out. ... What NYT did thru the “hacking” is proving that Open AI ignored the opt out
Incorrect on several grounds. If the NYT reports a piece of news and OpenAI reports the same event, it's not a "derivative work". You cannot copyright facts and ideas -- only a particular expression of them.

And Open AI has never denied they used NYT articles for training. What they deny is that this training constitutes a "reproduction or distribution" of the original text ... I.e. that the specific language used in those articles is stored within ChatGPT, and can be extracted at will.
 
Incorrect on several grounds. If the NYT reports a piece of news and OpenAI reports the same event, it's not a "derivative work". You cannot copyright facts and ideas -- only a particular expression of them.

And Open AI has never denied they used NYT articles for training. What they deny is that this training constitutes a "reproduction or distribution" of the original text ... I.e. that the specific language used in those articles is stored within ChatGPT, and can be extracted at will
Not one iota incorect. If this would be strictly the news, then yes, Open AI can get the news and report on them as anyone else and their grandma does. However once anyone makes an analysis of that news or write an article about it, or an opinion piece based on it, it becomes original work, protected by copyright and, unless specified otherwise, requires permission or a licence to be used.

NYT clearly opted out so Open AI should have NONE of NYT’s work. NYT is not suing because Open AI knows there’s a war in Ukraine. It is suing because Open AI has NYT opinion pieces about it. It is suing not because Open AI can tell you how to make Coq au Vin but because it has in its possession without permission the published and copyrighted Melissa Clark’s recipe. And so on.

Open AI has been caught hand up to their elbow in NYT’s cookie jar. So they will pay, as they should.
 
Not one iota incorect.
I'm far from a legal expert, but you clearly don't understand what "copyright" means. When you buy a copyrighted material, you buy the right to USE it. You can burn it, break it, make one backup of it, feed it to an algorithm etc, but you can't "copy" and distribute it. That's what copyright mostly entails.

You can argue, that the AI makes derivative works of it inside their AI-brain or reproduces it at their end-promt, but "fair use" have a very wide application in the US. The AI absolutely transfroms sufficiently the read articles to be under protection.

What is happening here, is that NYT have seen all that AI money changing hands, and it got too greedy. Instead of making a deal on their articles at a reasonable price, they fall flat on their ***. Now they can litigate all they want, "fair use" will cover OpenAI's case, as long legislators don't catch up to the technology.
 
I'm far from a legal expert, but you clearly don't understand what "copyright" means. When you buy a copyrighted material, you buy the right to USE it. You can burn it, break it, make one backup of it, feed it to an algorithm etc, but you can't "copy" and distribute it. That's what copyright mostly entails.

You can argue, that the AI makes derivative works of it inside their AI-brain or reproduces it at their end-promt, but "fair use" have a very wide application in the US. The AI absolutely transfroms sufficiently the read articles to be under protection.

What is happening here, is that NYT have seen all that AI money changing hands, and it got too greedy. Instead of making a deal on their articles at a reasonable price, they fall flat on their ***. Now they can litigate all they want, "fair use" will cover OpenAI's case, as long legislators don't catch up to the technology.
Except Open AI did not “buy” anything from NYT. No rights, nothing.

Also charging a subscription for derivative work originating from materials you have no permission for is not fair use.
 
Last edited:
If this would be strictly the news, then yes, Open AI can get the news and report on them as anyone else and their grandma does. However once anyone makes an analysis of that news or write an article about it, or an opinion piece based on it, it becomes original work, protected by copyright
Several errors again. First, even news reports are protected by copyright -- not the facts themselves, but that particular expression of those facts, I.e. the actual words used.

Two, this doesn't change for "analysis" or "opinion" pieces. Just as you cannot copyright facts, you can't copyright opinions either. Only a particular expression of them.

Except Open AI did not “buy” anything from NYT. No rights, nothing.
Earlier you claimed all these NYT stories were "behind a paywall" -- now you say OpenAI didn't buy rights to that paywall? Do you believe they hacked through it?

The reality is quite different. In actuality, OpenAI never visited the NY Times site. They used the Common Crawl Dataset -- a "free, open repository of publicly available web crawl data", which is collected in an identical manner to how search engines like Google work. It includes a huge amount of copyrighted material, but is non-infringing under Fair Use doctrine (this has already been tested in court).

Again, to summarize: OpenAI has a legal right to train their algorithms on this data. What they do NOT have is a right to "reproduce or distribute" copyrighted material. Period. The only thing that matters is what comes OUT of ChatGPT. Not what went into it.
 
Several errors again. First, even news reports are protected by copyright -- not the facts themselves, but that particular expression of those facts, I.e. the actual words used.

Two, this doesn't change for "analysis" or "opinion" pieces. Just as you cannot copyright facts, you can't copyright opinions either. Only a particular expression of them.


Earlier you claimed all these NYT stories were "behind a paywall" -- now you say OpenAI didn't buy rights to that paywall? Do you believe they hacked through it?

The reality is quite different. In actuality, OpenAI never visited the NY Times site. They used the Common Crawl Dataset -- a "free, open repository of publicly available web crawl data", which is collected in an identical manner to how search engines like Google work. It includes a huge amount of copyrighted material, but is non-infringing under Fair Use doctrine (this has already been tested in court).

Again, to summarize: OpenAI has a legal right to train their algorithms on this data. What they do NOT have is a right to "reproduce or distribute" copyrighted material. Period. The only thing that matters is what comes OUT of ChatGPT. Not what went into it.
So again you’re agreeing with me saying that “expressions” are subject to copyright. Oh gee, I thought we established that already. No error on my part then, as you insist on saying as that would somehow make it true.
The simple fact of one WRITING their opinion down in whatever way it tickles their fancy creates copyrighted work. Please don’t come back saying I’m wrong just to come around to say the same thing in a different way, as you keep on doing.

So every single opinion piece, news analysis, headphone group test, recipe or how-to created by NYT belongs to them. That is absolutely beyond dispute.

Second, NYT opted out of Open AI “crawling”. Again beyond dispute. NYT specifically did not want their copyrighted work to be used by Open AI. It is their right, that’s again beyond dispute.

Third, Fair Use doctrine sounds like this:
“107. Limitations on exclusive rights: Fair use41
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.”

Your argument may have had a leg to stand on if Open AI‘s use of NYT materials was non-commercial. However it is not.

So, to summarize: Open AI use of NYT copyrighted material for commercial purposes does NOT constitute Fair Use. If Open AI’s derivative work would have been free or for academic/ educational use only, then perhaps a judge would rule in their favour.

We will eventually see how the judge will rule on this sort of “training for profit“ Open AI was engaged in. Chances are, gaging by how fast OpenAI is rushing to get agreements from everyone, such as Reuters or CNN, one with NYT will be reached out of court.
 
Last edited:
So again you’re agreeing with me saying that “expressions” are subject to copyright. Oh gee, I thought we established that already. No error on my part then
Some of the errors on your part are:
(1) That paywalled material is "more" copyrighted than non-paywalled material.
(2) That OpenAI was directly accessing paywalled material.
(3) That opinion pieces are more copyrighted than news stories.
(4) That Fair Use is some sort of "obscure technicality".
(5) That a copyright owner can "opt out" of fair use exclusions.
(6) Confusing the "proper citations" that relate to plagiarism with an infringement claim.
(7) That training an algorithm on copyrighted material constitutes a derivative work.

The first six are absurd, contradictory, and/or easily disproven. As for the seventh, if I scan copyrighted stories from any source -- the NYTimes, Wikipedia, best-selling novels, etc -- to build a database of word usage counts or discover newly coined words or words which are changing in meaning -- that's not a derivative work. Period. Lexicographers have been doing this for centuries to write their dictionaries. If I scan the same material to extract factual data, or even the prevalence of certain opinions or ideas -- that's not a derivative work either.

Your argument may have had a leg to stand on if Open AI‘s use of NYT materials was non-commercial. However it is not.
You're just digging your hole deeper here. You've misunderstood your own reference. Many commercial usages are non-infringing, and many non-commercial usages ARE infringing. Commercial status is merely one factor in a complex multi-prong infringement test.
 
Back