Smart TV apps are quietly scraping web data for AI training

Alfonso Maruccia

Posts: 2,559   +950
Staff
Scraping Bubble: Companies specializing in scraping or otherwise harvesting publicly available content to train AI models are becoming increasingly common. In particular, some firms are targeting smart TV applications and similar platforms, attempting to leverage users' internet connectivity in exchange for low-cost incentives such as reduced advertising or free streaming access.

Bright Data operates a global proxy network designed to collect publicly available web content, and customers are voluntarily joining the network so that they can spare a few dollars on their TV viewing experience. According to a recent report, code associated with Bright Data has appeared in certain smart TV applications. When questioned about this practice, some developers have declined to comment or have removed the proxy integration.

The company describes its platform as a way to transform web scraping into a structured "data delivery" system. Marketing materials claim that its Bright SDK technology enables "100 percent" user monetization, promising global reach while preserving the original user experience.

The Bright SDK can be embedded into smart TV applications, and users are typically asked to consent before joining the proxy network. Once activated, the connection may be used to route web traffic through the user's residential internet connection. The downloaded data is then sent to Bright Data's servers to be sold to AI companies for model training and LLM development workloads.

In a webinar shared with industry participants years ago, Bright Data Chief Production Officer Ariel Shulman stated that the SDK does not directly track users. The code is said to operate anonymously in the background, while web crawling activity can be difficult to monitor because it utilizes distributed residential IP connections.

Bright Data has claimed that its proxy network includes roughly 150 million crawling clients, a figure that reportedly encompasses smart TV applications as well as software running on PCs and mobile devices.

Bright Data spokesperson Jennifer Burns stated that participation in the network is "consensual," adding that users can opt out at any time through a simple two-step process.

The company says the Bright SDK is designed to initiate web crawling only when local computing and network resources are not significantly affected. However, users generally have limited visibility into how much background data is being transmitted while the SDK is active during television viewing or web browsing.

The report compares Bright Data's business model to IPIDEA, a massive, China-based proxy network dismantled by Google earlier this month. While critics argue that distributed proxy networks can be abused for malicious purposes, Bright Data maintains that its platform is intended for legitimate data access and research applications.

Nevertheless, platform providers appear to be tightening restrictions on background SDK activity. Google has reportedly begun prohibiting apps from running persistent background SDK processes, while Amazon has taken steps to block applications that rely on third-party proxy mechanisms such as Bright SDK integrations.

The company continues to maintain partnerships with smart television ecosystems based on Tizen OS and webOS, where reports suggest that hundreds of applications may be incorporating proxy-based web data collection functionality.

Permalink to story:

 
Don't connect your TV to the internet.
Don't trust anything you didn't build yourself.

We lost the battle on phones. Enjoy this before profit demands they ship every television with 4g/5g/Starlink for telemetry only, because as soon as they work out the numbers, they will.

Regulations are captured. You are the product. Don't be dumb and lazy about it or you will win the appropriate life prizes. No one who represents you cares about you. Care about yourself. Understand that you now live in a surveilence economy where predatory business practices are not only tolerated, but rewarded. This is not fringe.

This is reality.
 
The consent part is doing a lot of heavy lifting here. A two-step opt-out buried in a streaming app settings menu doesn’t exactly scream informed participation, especially when most people just wanted to save a few bucks on ads.

Also wild that Google and Amazon are cracking down not because it’s creepy, but because persistent background SDKs mess with their platform control.
 
Plz, could you tell the dame story once again to my brother and nieces? They cry for cartoonz on youtube all the time, and don’t even know the dangers they put themselves in!
Wanna scary? Read tv acr tech. Every second or half printscreen is send to tv producer company. Not connected to wifi? That's fine, it will find any open or iot network in range.


My 2010 Samsung TV has no apps, no internet, it's just a TV. Long may it live.
I do not watch any air or cable tv. My tv is connected to an isolated vlan, and I watch stuff I want from a mini pc and local network.
 
So the deal is: you let a company use your home internet connection as an anonymous web scraping node for AI companies, and in return you get... slightly fewer ads on a free app you already have. The value exchange here is absolutely insane and I guarantee 99% of people tapping "I agree" think they're just accepting cookie preferences.
 
What are they going to do when it is simply worthless to share anything on the internet due to scrapping.
The work that millions of people can still generate money off will all go to AI.

AI is eating other people's food that they grow for themselves. If they are not the ones who benefit from growing it, then what is the point growing it at all?
 
pi-hole can only block dns requests, but now the apps including google have hardcoded DNS servers, and pi-hole doesn't see this traffic at all.
Would that not be what pfsense then comes into play for? Blocking/redirecting DNS requests for one, and (trying to) finely tune access control?
 
Can pfsense be run on a router or Raspberry Pi?
No, pfsense runs on amd64 compatible hardware. Considering the expectation of speed, throughput and latency (lack thereof), it is not the kind of net appliance OS you want to be emulating with a translation layer.

Amd64 native Hypervisors are fine though (bare metal even better), which can also host pihole in a VM just fine, or just have two physically separate devices.
 
Would that not be what pfsense then comes into play for? Blocking/redirecting DNS requests for one, and (trying to) finely tune access control?
It can block DNS requests, but then the device will render "no internet connection" so androidtv or other smartTV will be useless.
 
Back