AI will die "overnight" if copyright permission is enforced, says former Meta exec

Alfonso Maruccia

Posts: 1,761   +527
Staff
Editor's take: The UK Parliament is debating the Data (Use and Access) Bill, a law set to regulate access to user and customer data. The bill could have a dramatic impact on the IT sector, particularly AI companies that aggressively collect vast amounts of human-generated data online to train their often unpredictable chatbots.

Former UK Deputy Prime Minister Nick Clegg says artificial intelligence companies shouldn't need to seek permission every time they use copyright-protected data. Speaking at a recent event to promote his book, "How to Save the Internet," Clegg – who previously served as a Meta executive – sided with the AI industry on the issue.

Forcing technology firms to comply with copyright laws – and notify rights holders when they use protected content to train artificial intelligence models – would kill the UK's AI industry "overnight," Clegg warned. The content is already publicly available, he argued, and AI systems need vast amounts of data to improve their reasoning.

Clegg argues that current copyright laws are incompatible with artificial intelligence, as requiring companies to obtain permission every time they train a model would render the entire technology unworkable. He believes artists and rights holders should be able to opt out of data scraping for AI training, but seeking individual confirmations isn't a viable solution.

"I think people should have clear, easy to use ways of saying, no, I don't. I want out of this," the former Meta VP said. "But I think expecting the industry, technologically or otherwise, to preemptively ask before they even start training - I just don't see. I'm afraid that just collides with the physics of the technology itself."

Clegg is focusing on the UK AI industry as politicians debate the new Data (Use and Access) Bill, which aims to regulate access to customer and company data. A coalition of artists and authors, led by film director Beeban Kidron, pushed to amend the law, requiring AI companies to disclose the data they use to train their models. However, parliament rejected the proposal.

In a recent op-ed in The Guardian, Kidron accused the government of essentially approving a plan to facilitate mass cultural theft. She said UK authorities are allowing AI companies to use copyrighted works freely while opting out of such practices would be impossible without proper transparency.

The government can certainly "bully its way to victory" and pass the bill by majority vote, but doing so would deal a catastrophic blow to Britain's creative industry. However, the fight isn't over. The draft will return to the House of Lords for a new vote on June 2.

Permalink to story:

 
We already have copyright laws, and procedures for alleged violations. And you know what exciting new tool could become really good at identifying copyright violations and helping copyright holders pursue their remedies? AI!

Meanwhile, every human who has ever produced anything learned from those who came before them. So to me the answer is not trying to stop AI from learning as much as it can from the world around it but in ensuring that the work it distributes to others is its own original work, not copies of someone else's.
 
As far as I know they already cannot us any content they want. No company can use another companies copyrighted material or intellectual property without a license. AI systems shouldn't be any different. AI companies will profit off of their AI that was trained with other's copyright materials and those people should get paid if someone else is trying to profit off of ther works even indirectly.
 
Copyright protects the expression of an idea, not the idea itself. This concept is a shallow legal construct that was formed under the pressure of a small group, potentially at the expense of the fundamental right to free access to information. It's time for reform.

Copyright should last for 10-15 years at most and fair use should be applied automatically as long as it doesn't involve direct commercial use. For example, you would pay for movies in theaters, but you could watch them for free at home thanks to automatic fair use. The same would apply to software, music, etc. personal use free.

Additionally, copyright should no longer grant exclusive rights for adaptation. For instance, anyone should be able to make a sequel to any work without needing permission (like making a sequel to Star Wars). This would also solve the friction with the AI training situation, no need for opt out.

On the other hand, all AI models (the weights) should be automatically placed in the public domain (they are not copyrightable, but this should be made clear), and AI-generated outputs should also be automatically placed in the public domain. Everyone can use something that is in the public domain, which is nice, right? Think about being able to play Mozart on the piano without any hassle.

You don't need de jure legal protection to commercialize something; you can rely on de facto protection by controlling the workflow. Isn't that a better approach and more compatible with the fundamental right of free access to the information?

The movies will make money in theaters, the music will generate revenue at concerts and lives and video games as long as their digital locks don't break or if they require an online connection. Software in businesses will continue to generate revenue for a decade and for personal use as long as it remains locked. Printed books will sell through their physical form, while digital books will remain monetizable as long as they stay locked in some format. However, with OCR technology available, it's becoming difficult to monetize books effectively. Practically speaking, with these rules in place, there will be no practical way to monetize books for personal use, a small side effect. All the others are similar to how it is in practice now.
 
Last edited:
As far as I know they already cannot us any content they want. No company can use another companies copyrighted material or intellectual property without a license. AI systems shouldn't be any different. AI companies will profit off of their AI that was trained with other's copyright materials and those people should get paid if someone else is trying to profit off of ther works even indirectly.
I'm not sure what you mean by the indirect part.

By way of example, most of what I learned about math came from copyrighted math textbooks that I paid for. So if your point is that the AI companies should be buying the books, subscriptions, etc they are training on (that are not in the public domain) then I agree.

But if your point is that well if you learned math from a copyrighted textbook, then you should owe that book company a few pennies every time you do a math problem, then I strongly disagree. That's not how humans do it, and the by the way that book's author and publisher also learned a lot about math from copyrighted materials that they read before they wrote their own. Do we imagine an endless chain of payments going back through time to the first caveman who drew a stick figure?

There's another issue in that copyright was never intended to be a property right. It is explicitly a compromise designed so that the arts & sciences can flourish while also ensuring that their progress ultimately benefits all mankind. Today's news reporting about a specific event is an example of something that can have copyright protection. But descriptions of basic concepts of math, which are centuries old, have rightfully passed into the public domain and may not be owned or controlled by any private interest.
 
As far as I know they already cannot us any content they want. No company can use another companies copyrighted material or intellectual property without a license. AI systems shouldn't be any different. AI companies will profit off of their AI that was trained with other's copyright materials and those people should get paid if someone else is trying to profit off of ther works even indirectly.

In the US that is not really true; plenty of uses of other companies' copyrighted material can fall under fair use. For example, Google Books; Google was sued by the Author's Guild for the Google Books project, where Google mass scanned a bunch of copyrighted books (sometimes even pulling them from libraries) in order to create a searchable database of those books, that also shows small snippets. The courts sided with Google, finding that such project qualified as fair use. I am guessing this case will probably be heavily cited in US lawsuits regarding AI & fair use. Of course this article is referencing the UK, which is a separate kettle of fish in regards to copyright law.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.
 
Copyright permissions mean absolutely nothing in China. They'll keep training their models with everything they can scrape, while we're strangling ourselves will legal bloatware.

The other big winner besides China will be the herds of lawyers making billions of dollars off meaningless disputes.
Artists are screwed anyway, even if a written permission is required for using any single piece of anything - people will simply use the Chinese models trained with everything.

We should think how to fairly compensate the content authors, not what to ban and what new regulations and permissions to invent.
 
But wait, Meta employees downloaded terabytes of pirated books to train their AI.
They hope to earn billions from using it eventually. Are they planning to share the money with the writers of those pirated books they used? Btw, many of those people are not anywhere as rich as Zakerberg.
That is how they see AI evolution, theft, theft from every last person who posts his art sells books. They want to rob every last person who uses internet and then just enjoy profits.
They want to not pay for the things they train their AI on. But they will definitely ask every person to pay for using their AI.
 
It’s a clear double standard. Human download the content = piracy. Corporate downloads the content = fair use? How does that make sense? Videos on Netflix for example are publicly available, does downloading and somehow bypassing the DRM to circumvent paying monthly subscription equate to piracy?
 
Copyright permissions mean absolutely nothing in China. They'll keep training their models with everything they can scrape, while we're strangling ourselves will legal bloatware.

The other big winner besides China will be the herds of lawyers making billions of dollars off meaningless disputes.
Artists are screwed anyway, even if a written permission is required for using any single piece of anything - people will simply use the Chinese models trained with everything.

We should think how to fairly compensate the content authors, not what to ban and what new regulations and permissions to invent.
That is the greatest point when discussing AI and copyright.
We need to cut the cord with it, but we have too many people who have so much money in there that it seems impossible unless a war starts.
 
It wouldnt die, the dominance would just shift from the West to countries that dont care about Western copyright like China, as well as to open source LLMs that are impossible to enforce because they can be ran on any PC.

It would be utterly close-minded to hinder AI development in the West with copyright rules.
 
Back