AI companies to music labels: scraping copyrighted tracks on the internet to train algorithms is "fair use"

midian182

Posts: 10,633   +141
Staff member
Why it matters: The hot-button topic of AI companies scraping copyrighted content from the internet without compensation or credit is in the spotlight again. Music-generating AI startups Udio and Suno are battling copyright lawsuits filed by major record companies for this very reason. Their defense is admitting that they did engage in these practices, but claim doing so falls under fair use.

Suno and Udio were hit with separate copyright infringement lawsuits from music labels Universal Music Group, Warner Music Group, and Sony Music Group on June 24.

The start-ups, both of which have hired international law firm Latham & Watkins LLP, don't deny that their neural networks have been scraping copyrighted music from the internet. The companies' defense is that their actions are fair use under copyright law.

"It is no secret that the tens of millions of recordings that Suno's model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case." Suno acknowledged. Its excuse is that the training data includes "essentially all music files of reasonable quality that are accessible on the open internet," which will include millions of illegal copies of music tracks floating around the web.

"It is fair use under copyright law to make a copy of a protected work as part of a back-end technological process, invisible to the public, in the service of creating an ultimately non-infringing new product," Suno continued, suggesting that as the music it creates doesn't feature samples straight from the original songs, using the music for training purposes is legally allowed.

Both Suno and Udio emphasized that samples of copyrighted music aren't stored as a library of pre-existing content in the neural networks of their AI models, "outputting a collage of 'samples' stitched together from existing recordings" when prompted by users.

The companies also argue that they are creating new, original music, using their training data to define styles and genres, such as opera or jazz, which are not something that anyone owns.

"Our intellectual property laws have always been carefully calibrated to avoid allowing anyone to monopolize a form of artistic expression, whether a sonnet or a pop song. IP rights can attach to a particular recorded rendition of a song in one of those genres or styles. But not to the genre or style itself."

Suno says the lawsuit is merely a response from the labels wanting to shut it down as they don't want competition or threats to their market share.

The RIAA, which launched the lawsuit, had the expected response to the fair use argument. "Their industrial scale infringement does not qualify as 'fair use.' There's nothing fair about stealing an artist's life's work, extracting its core value, and repackaging it to compete directly with the originals," a spokesperson for the organization said. "Defendants had a ready lawful path to bring their products and tools to the market – obtain consent before using their work, as many of their competitors already have. That unfair competition is directly at issue in these cases."

Udio recently raised $10 million in seed funding from investors that include musicians will.I.am and Common. A month later, Suno raised $125 million.

In May, Sony sent a letter to more than 700 companies and organizations warning them that they are not authorized to use music data owned by the label to teach algorithms how to synthesize audio content. Elsewhere, Universal Music Group has asked streaming companies to block data scraping for AI training, while Warner Music Group Corp. CEO Robert Kyncl testified in a US Congress hearing about the need for strong legal protections to regulate the use of copyrighted content for AI training.

The AI companies' arguments are similar to those of Mustafa Suleyman, Microsoft's AI unit leader. In a July interview, Suleyman said that material published openly on the web essentially becomes "freeware" that anyone can copy and use as they please. That view appears to contradict the FAQ page from the US Copyright Office, which states that "your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device."

Several AI companies have been accused of training their programs on copyrighted material. OpenAI is facing numerous lawsuits over these claims, while Perplexity was accused of ignoring the Robots Exclusion Protocol web standard to scrape areas of websites that operators do not want accessed by bots.

Permalink to story:

 
Well, the biggest prior case regarding a company mass processing copyrighted works for their own use would be the Author's Guild vs Google; Google had a project to mass scan books (even pulling them out of libraries) in order to create a searchable database of said books, that includes small snippets when you are searching through them. The Author's Guild sued to stop Google, but failed; the Courts ruled that this project met the requirements of fair use. I am sure that case will get cited a lot during these proceedings.
 
Still waiting for the day an AI model to scrapes another AI model, and have the first AI model sue for copyright infringement.
 
There is a little bit of merit to the argument the AI companies are making. Fair-Use is something that no one can deny. However, Fair-Use is strictly for NON-COMMERCIAL purposes. AI companies are commercial entities, so under technicalities of ethics and law, they are violating copyright law. To what extent is yet to be determined as this is entirely new territory in tech and law.

It will be interesting how this all plays out, but at the end of the day, the AI companies are in the wrong.
 
Fair-Use is something that no one can deny. However, Fair-Use is strictly for NON-COMMERCIAL purposes.
This is a common -- but very incorrect -- misconception. There are plenty of cases where fair use covers commercial use. Conversely, there are also many cases where non-commercial use is infringing and doesn't qualify. Whether the infringement involves commercial use is merely one of a multi-pronged legal test.
 
Last edited:
It’s time the artist to find a real job. Copyright is not a fundamental right. IP is an exclusive right over the freedom of others that was instituted to give creators of art the competitive advantage to make money from the art they created and with that money to survive without having to do some laborious, dangerous and difficult work like most others. We now have machines to produce art and better quality art. So it is now time to abolish this exclusive right against the freedom of all the other people, since we no longer need people in this role (of art creators). So copyright should be abolished and those who want to do it should do it as a hobby or accept that the rest of us will have free access to artworks. So everything in the public domain by default.
 
Hell, if that's fair use then I want to download all my music and movies for no charge on the day of release too.
 
Not one single sentence this AS (Artificial Stupid) typed here was a complete, proper sentence or even made any sense. That's the power of the fake "trillions of dollars in AI research, investment, and chip production" everyone's talking about.

It doesn't exist. There's no such thing and the Techspot staff along with everyone else is just here to get paid to promote absolute vaporware. Nvidia, AMD, and Intel are far more complicitie but everyone on the TS staff pushing it at all is on their payroll and an absolute dolt.
 
Still waiting for the day an AI model to scrapes another AI model, and have the first AI model sue for copyright infringement.
I commented a week ago saying that the AI company with least honor and respect toward data they do not own or have right to use, will have a chance to advance furthest.
AI scrapping AI seems like a culmination of this behavior.
My understanding is that AI will always need new data to stay current.
And when more and more companies that have huge databases of useful "food," they will sell it themselves, for the right price.
 
There is a little bit of merit to the argument the AI companies are making. Fair-Use is something that no one can deny. However, Fair-Use is strictly for NON-COMMERCIAL purposes. AI companies are commercial entities, so under technicalities of ethics and law, they are violating copyright law. To what extent is yet to be determined as this is entirely new territory in tech and law.

It will be interesting how this all plays out, but at the end of the day, the AI companies are in the wrong.

Fair use can indeed apply to both commercial and non-commercial contexts. For example, companies like IMDb and Jaxsta build their content databases under the principles of fair use and then monetize this content by selling access to their databases. They rely on fair use to collect and catalog information about copyrighted works while generating revenue through subscriptions or other business models.
 
Hell, if that's fair use then I want to download all my music and movies for no charge on the day of release too.

It's not even remotely the same thing. Using AI to analyze and generate music is not equivalent to stealing music any more than learning to play a song on an instrument is. This distinction is why numerous copyright lawsuits on this issue have consistently failed.
 
Last edited:
It's not even remotely the same thing. Using AI to analyze and generate music is not equivalent to stealing music any more than learning to play a song on an instrument is. This distinction is why numerous copyright lawsuits on this issue have consistently failed.
Then maybe we can download pirated movies not for entertainment, but for analyzing the cinematographic techniques. Downloading pirated music to learn from it ... I'm not sure this qualifies as fair use.
 
This is like the biggest example of why 'I have nothing to hide' is not a well thought out (and even dangerous) reasoning.

People will find ways to exploit you if you put everything out in the open.

The original authors of music and art definitely need to be compensated, especially if the AI companies are profiting off their work. They wouldn't exist without having access to and exploiting existing content that artists make their living with.
 
Last edited:
Then maybe we can download pirated movies not for entertainment, but for analyzing the cinematographic techniques.
Why yes, you can -- if you're actually doing that. Visit any class on film history or cinematography and you'll see the professor freely and liberally showing footage from dozens, perhaps hundreds of copyrighted works.

If, however, you hope to merely claim to a federal judge, "yes, your honor, I was simply learning art techniques" and hope to make it stick -- well, good luck.

The original authors of music and art definitely need to be compensated, especially if the AI companies are profiting
You misunderstand the basis of IP law. It exists to benefit society, not individual holders. Schools and universities have been profiting for centuries by training students on copyrighted works. Art and music critics profit off copyrighted works by selling their own evaluations of them. This is all standard fair use policy.
 
Last edited:
Using AI to analyze and generate music is not equivalent to stealing music any more than learning to play a song on an instrument is.
That doesn't hold any water at all. AI is just a corporate entity. They're either using the copyrighted material or they aren't.
 
You misunderstand the basis of IP law. It exists to benefit society, not individual holders. Schools and universities have been profiting for centuries by training students on copyrighted works. Art and music critics profit off copyrighted works by selling their own evaluations of them. This is all standard fair use policy.
You may be right, but the publishers definitely don't agree. They're constantly getting public good systems shut down or at the very least material deleted from those systems. Including from libraries. It's a never ending battle that consumes vast sums of legal money, and fair use usually loses.
 
That doesn't hold any water at all. AI is just a corporate entity. They're either using the copyrighted material or they aren't.
Kadrey v. Meta Platforms: In this case, authors sued Meta(Facebook) for using their books to train its LLaMA models. The court dismissed claims that the models themselves or their outputs were infringing derivative works, emphasizing that the plaintiffs needed to show substantial similarity between the model outputs and the original works. The court's approach indicates that simply using copyrighted materials for training does not automatically result in infringement without a clear connection to the outputs (Perkins Coie).
 
Then maybe we can download pirated movies not for entertainment, but for analyzing the cinematographic techniques. Downloading pirated music to learn from it ... I'm not sure this qualifies as fair use.
Downloading any thing is legal, up loading it is not.
 
Not very often and the situations where it's allowed are extremely limited. Non-profit purposes are much less restricted and restrictive.
Aiy yai yai. First you claimed fair use was "strictly for non-commercial" purposes. Now you claim (nearly as incorrectly) that such use is "extremely limited". This is false: commercial usage is simply one factor of a complex many-pronged legal test. In fact the most restrictive use of IP is non-commercial -- if you're listening to that copyrighted song or reading that copyrighted book for your own personal enjoyment, you're infringing.

Nor is "non-profit" the opposite of commercial. A non-profit entity can still use IP commercially, and a for-profit corporation use it non-commercially.

Legally, the real touchstone to this case will boil down to two factors. One, that training AI models is transformative usage (the models themselves are not the songs used to train them), and two, that this usage does not substantially impact the market for the original works. Are people less likely to listen to Taylor Swift songs because a model was trained on her music?

The copyright holders will likely attack that latter point by noting the end goal of AI is to generate competing works. However, the problem with that argument is that it's no different than using the same copyrighted material in an educational setting, to train students who are being educated on these works for the specific purpose of eventually generating their own competing IP.
 
Back