The future of AI and journalism at stake: OpenAI battles news giants in copyright lawsuit

Skye Jacobs

Posts: 587   +13
Staff
What just happened? A coalition of news organizations led by The New York Times faced off against OpenAI in federal court on Tuesday, continuing a legal battle that could shape the future of AI and journalism. The hearing, centered on OpenAI's motion to dismiss, marks a critical juncture in a high-stakes copyright infringement case that asks a fundamental question: Can AI companies use copyrighted news articles to train their language models without consent or compensation?

The case has merged lawsuits from three publishers: The New York Times, The New York Daily News, and the Center for Investigative Reporting. The publishers argue that OpenAI's practices amount to copyright infringement on a massive scale, potentially threatening the future of journalism.

The publishers' legal team contends that OpenAI and its financial backer, Microsoft, have profited from journalistic work that was scanned, processed, and recreated without proper authorization or payment. Jennifer Maisel, a lawyer for The New York Times, drew a parallel to criminal investigations, stating in court, "We have to follow the data."

Ian Crosby, another attorney representing the Times, emphasized the substitutional nature of ChatGPT and Microsoft's Bing search engine, arguing that these AI-powered tools have become alternatives to the publishers' original work for some users. This point is crucial in establishing copyright infringement.

OpenAI's defense rests on the doctrine of fair use, a principle in US law that allows copyrighted material to be used for purposes such as education, research, or commentary. Joseph Gratz, representing OpenAI, argued that the company's AI models are not designed to regurgitate entire articles but rather to recognize patterns in data.

The hearing delved into the technical aspects of large language models, with OpenAI and Microsoft's legal team explaining to Judge Sidney Stein how ChatGPT processes and analyzes data. They described a system that breaks down text into "tokens" and learns to recognize patterns rather than simply retrieving and reproducing copyrighted content.

However, the publishers raised concerns about a feature called "retrieval augmented generation," which allows ChatGPT to incorporate up-to-date information from the web into its responses. Steven Lieberman, attorney for The New York Daily News, characterized this as "free riding," suggesting that readers might turn to AI-generated content instead of visiting publishers' websites.

The stakes in this case are extraordinarily high. The New York Times is seeking billions of dollars in damages and calling for the destruction of ChatGPT's dataset. Such an outcome could be catastrophic for OpenAI, potentially forcing the company to rebuild its AI models using only authorized works. "If you're copying millions of works, you can see how that becomes a number that becomes potentially fatal for a company," Daniel Gervais, co-director of the intellectual property program at Vanderbilt University, told NPR.

The tech and publishing worlds now await Judge Stein's decision on whether to dismiss the case or allow it to proceed to trial.

Permalink to story:

 
Despite accusing artificial intelligence of infringing their copyrights and causing significant damages, The New York Times has demonstrated remarkable financial resilience. From 2020 to 2023, their annual revenue has steadily increased, reaching $2.075 billion in 2021, marking a staggering 16.33% growth from the previous year. In 2022, this figure soared to $2.308 billion, a 11.25% increase, and by 2023, it further climbed to $2.426 billion, a 5.1% rise from 2022. This undeniable financial success exposes the validity of their claims of substantial damages from AI infringement as nothing more than a pretext.

How they can simultaneously accuse AI of causing significant harm while experiencing such remarkable revenue growth? This clear inconsistency suggests that their accusations are not driven by genuine concerns over copyright violations but are instead a strategic move to leveraging legal action for financial gain, thereby raising serious concerns about potential fraud.

For example, in a situation where a ball 🏐 bumps on the front glass of a vehicle without causing it to break, the vehicle's owner cannot pursue legal damages against the individual responsible for the ball’s trajectory, as there are no actual real damages to the vehicle (the glass did’t broke). If the owner of the vehicle attempts to claim damages based on perceived reality rather than actual harm, such actions could be considered fraudulent.
 
Last edited:
The only thing that makes AI agents even remotely useful is the high quality reliable data they've scrapped from sites like Wikipedia, newspapers, books etc. And ironically, AI generated content is not even good enough as input into additional AI training.

99% of the web is utter garbage and porn. It's only because of these high quality sources that ChatGPT exists.

So OpenAI needs to STFU and pay its fair share.
 
The closest prior legal case I can think of is The Author's Guild vs Google, over Google Books. For those not familiar with it, Google began a project of mass scanning assorted books, many still under copyright (and Google pulled a bunch of them out of libraries to scan too), to create a very large searchable database of said books, that also show small snippets. The Author's Guild organization was not too happy with this, and sued, but Google prevailed in court & the project was ruled to be transformative enough to fall under fair use. I expect that court precedent to be heavily cited during this upcoming case.
 
Bad news for the News organisations, Chump has flagged to remove all restrictions on AI, ushering in a bold era or AI surveillance, deeper fakes, copyright theft, massive jobs losses and so on.
Even if they win in court, under Chump it won't matter.
 
For example, in a situation where a ball 🏐 bumps on the front glass of a vehicle without causing it to break, the vehicle's owner cannot pursue legal damages against the individual responsible for the ball’s trajectory, as there are no actual real damages to the vehicle (the glass did’t broke). If the owner of the vehicle attempts to claim damages based on perceived reality rather than actual harm, such actions could be considered fraudulent.

Yea but what if the vehicle was being driven when the ball struck the glass, causing the driver to react in a manner that caused an accident and someone was seriously injured or killed ?? Who is responsible then ? That is NOT and can NOT be a "perceived" reality, but ACTUAL reality...

It's the AI, stupid :D
 
Back