Google's policy update confirms that all your posted content will be utilized for AI training

If it's public, Google will scrape it

By Rob Thubron July 4, 2023, 6:46 17 comments

Google's policy update confirms that all your posted content will be utilized for AI training

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

A hot potato: If you were in any doubt that the content you post publicly online will be fed to AI models, take a look at Google's updated privacy policy. The document now explicitly states that the company reserves the right to collect and analyze pretty much anything people share on the web to train its AI systems.

Google's update over the weekend introduced new wording to its privacy policy. It previously stated that people's data would be used to train "language" models, mentioning only Google Translate. The updated version changes this to "AI models," specifically mentioning Bard and Cloud AI alongside Translate.

One of the many contentious issues with generative AI systems such as ChatGPT and Bard is the way they scrape and use data. It might be publicly available information, but that doesn't stop the plagiarism and privacy concerns, not to mention the possibility of the AI misinterpreting what was said or offering up old, outdated answers. Even Google has warned employees to be cautious when using chatbots like its own Bard as they can make undesired code suggestions.

There's also a question of whether this sort of data scraping is even legal. ChatGPT creator OpenAI is facing lawsuits over accusations that it collected personal information from internet users illegally and used the data to create its products.

OpenAI is also dealing with a lawsuit over copyright infringement and privacy violations relating to claims that it used copyrighted books without permission to train its AI systems. The company allegedly copied text from these titles unlawfully by not obtaining consent from the copyright holders and not giving them credit or compensation.

To address extreme levels of data scraping & system manipulation, we've applied the following temporary limits:

- Verified accounts are limited to reading 6000 posts/day
- Unverified accounts to 600 posts/day
- New unverified accounts to 300/day
– Elon Musk (@elonmusk) July 1, 2023

Data scraping seems to be an especially vexing subject for Elon Musk. Twitter over the weekend temporarily limited the number of tweets accounts could read per day to allegedly address "extreme levels" of data scraping and "system manipulation" on the platform – though not everyone agrees this was the reason for the limitation.

Reddit has also faced a slew of troubles since turning off free access to its APIs to stop data harvesting. The move resulted in over 8,000 subreddits going dark in protest and some switching to NSFW.

17 comments 4K likes and shares

// Related Stories

Featured on TechSpot