Meta trained its AI assistant using your public Facebook and Instagram posts

Private posts and conversations were excluded

By Rob Thubron October 3, 2023, 5:18

Meta trained its AI assistant using your public Facebook and Instagram posts

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

A hot potato: Meta has announced that it used public posts from Facebook and Instagram to train parts of its new AI virtual assistant. The social media giant emphasized that it did not include users' private posts or messages shared with friends and family as part of its training data.

Speaking in an interview with Reuters at Meta's Connect conference last week, Nick Clegg, the company's president of global affairs, said "We've tried to exclude datasets that have a heavy preponderance of personal information." The former UK Deputy Prime Minister added that the "vast majority" of the data used by Meta for training was publicly available.

Meta announced last Wednesday that it was introducing a beta version of Meta AI, an advanced conversational assistant available on WhatsApp, Messenger, and Instagram, and coming to Ray-Ban Meta smart glasses and Quest 3. Available only in the US, the assistant offers real-time information and generates photorealistic images from text prompts.

Meta AI is powered by its LLaMA 2 language model released in July, along with the Emu text-to-image model, both of which have been trained on public Facebook and Instagram posts.

Clegg said LinkedIn was an example of a website whose content Meta purposely did not use for data training due to privacy concerns.

One of the many controversial elements of generative AI remains the copyright questions relating to content their LLMs are trained on. Artists have launched copyright lawsuits against Stable Diffusion and Midjourney this year, while authors including John Grisham and George R.R. Martin have sued OpenAI. Clegg said he expects a "fair amount of litigation" over the matter of "whether creative content is covered or not by existing fair use doctrine."

"We think it is, but I strongly suspect that's going to play out in litigation," Clegg said.

Meta isn't the only company using user content to train its AI. Elon Musk's xAI is doing the same thing with users' tweets, while Google's policy update in July confirmed that all posted user content will be used for AI training.

Last Wednesday also saw Meta boss Mark Zuckerberg announce a variety of AI-based chatbots featuring the likenesses of celebrities and influencers, including Tom Brady, Mr. Beast, Paris Hilton, Kendall Jenner, and Snoop Dogg. Meta said it will launch 28 of the bots, which are also powered by LLaMA 2. The event wasn't exactly a resounding success.

2 comments 56 likes and shares

// Related Stories

Featured on TechSpot