Scraping Bubble: Companies specializing in scraping or otherwise harvesting publicly available content to train AI models are becoming increasingly common. In particular, some firms are targeting smart TV applications and similar platforms, attempting to leverage users' internet connectivity in exchange for low-cost incentives such as reduced advertising or free streaming access.
Bright Data operates a global proxy network designed to collect publicly available web content, and customers are voluntarily joining the network so that they can spare a few dollars on their TV viewing experience. According to a recent report, code associated with Bright Data has appeared in certain smart TV applications. When questioned about this practice, some developers have declined to comment or have removed the proxy integration.
The company describes its platform as a way to transform web scraping into a structured "data delivery" system. Marketing materials claim that its Bright SDK technology enables "100 percent" user monetization, promising global reach while preserving the original user experience.
The Bright SDK can be embedded into smart TV applications, and users are typically asked to consent before joining the proxy network. Once activated, the connection may be used to route web traffic through the user's residential internet connection. The downloaded data is then sent to Bright Data's servers to be sold to AI companies for model training and LLM development workloads.
In a webinar shared with industry participants years ago, Bright Data Chief Production Officer Ariel Shulman stated that the SDK does not directly track users. The code is said to operate anonymously in the background, while web crawling activity can be difficult to monitor because it utilizes distributed residential IP connections.
Bright Data has claimed that its proxy network includes roughly 150 million crawling clients, a figure that reportedly encompasses smart TV applications as well as software running on PCs and mobile devices.
Bright Data spokesperson Jennifer Burns stated that participation in the network is "consensual," adding that users can opt out at any time through a simple two-step process.
The company says the Bright SDK is designed to initiate web crawling only when local computing and network resources are not significantly affected. However, users generally have limited visibility into how much background data is being transmitted while the SDK is active during television viewing or web browsing.
The report compares Bright Data's business model to IPIDEA, a massive, China-based proxy network dismantled by Google earlier this month. While critics argue that distributed proxy networks can be abused for malicious purposes, Bright Data maintains that its platform is intended for legitimate data access and research applications.
Nevertheless, platform providers appear to be tightening restrictions on background SDK activity. Google has reportedly begun prohibiting apps from running persistent background SDK processes, while Amazon has taken steps to block applications that rely on third-party proxy mechanisms such as Bright SDK integrations.
The company continues to maintain partnerships with smart television ecosystems based on Tizen OS and webOS, where reports suggest that hundreds of applications may be incorporating proxy-based web data collection functionality.