A hot potato: When it comes to tech companies training their AI models, it seems everything is fair game. Google, for example, uses some of the billions of videos on YouTube to train Gemini and Veo 3, and many creators are unaware that it's happening.

With more than 20 billion videos on the platform, YouTube is a treasure trove of data for AI companies to exploit – and many already have.

YouTube owner Google is also using the content to train its AI models, reports CNBC. The company later confirmed that it does do this, but it only uses a subset of videos and that it honors specific agreements with creators and media companies.

"We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI," said a YouTube spokesperson in a statement.

YouTube admitted that there was a need for safeguards in this area, which is why it has invested in protections to allow creators to protect their image and likeness.

But many experts point out that most creators and companies don't know that Google is training its models on their content. There's also no way for people to opt out of having their creations used this way.

The report notes that the size of YouTube's video library means that even if just 1% of the videos are used for training purposes, that amounts to 2.3 billion minutes of content, which is more than 40 times greater than the training data used by competing AI models, according to experts.

The situation has become more relevant since Google announced its Veo 3 video model that can create incredibly realistic video clips. As with many industries, the irony is that the content people create is being used to train an AI that could eventually replace them, or at least impact their income in what is a competitive market.

Some creators take a different point of view; they're using or planning to use Veo 3 to create content, even if it has been trained on their own original work.

There have been cases of other companies using YouTube to train their AIs without creators' knowledge. It was reported last year that OpenAI has transcribed over a million hours of YouTube videos to train its LLMs. Nvidia did the same thing, and at one point was scraping 80 years of videos daily – the company argued this was in "the spirit of copyright law." Anthropic, Apple, and Salesforce also turned to YouTube for their AI training data.

Google now allows creators to opt out of third-party training from AI companies such as Amazon and Nvidia, but there's no option to stop Google from doing the same.

Image credit: Jordan González