New single-frame watermark technology can detect piracy from just a screenshot

Alfonso Maruccia

Posts: 1,025   +302
Staff
In context: Watermarks are identifying patterns hidden within a piece of paper, an image, or other types of content. Manufacturers and providers can use it to detect counterfeit or piracy. Now, a Berlin-based company promises an even stronger watermarking system that works in the cloud.

German company castLabs recently introduced "single-frame forensic watermarking," a new, cloud-based way to easily and reliably identify piracy and IP theft. castLabs said that its novel approach allows the corporation to embed "tunable robustness level watermarks" in digital assets such as images, videos, documents, or any other type of digital file.

The system is presented as a way to protect copyrighted content in any possible scenario. Even in cases of distortion or obstruction, the new watermarking technology can seemingly work with just a single image to retrieve ID, IP addresses, session information and other useful detailed user data.

Single-frame forensic watermarking conceals vital information within a single frame of digital media, castLabs explained. The watermark can work in conjunction with other security measures such as Digital Rights Management (DRM) protections. The technology is split in two different parts, an embedder and an extractor.

castLabs' unique algorithm embeds the watermark during the encoding process through the company's cloud-based Video Toolkit platform. The watermark is embedded server-side, castLabs revealed, with unique IDs that are "strategically" hidden within video frames or other visual digital assets. Its visibility can be "precisely" regulated to serve different use cases, seemingly providing a high survival level even in low-bitrate video and single image files.

The second part of the system, a cloud-based (AWS) extractor, can scan various areas in a video frame, a document, or an image, detecting the hidden watermark with what castLabs defines as a "remarkable resilience." This so-called "blind extraction" approach can retrieve a hidden pattern from a watermarked content when access to the original watermark is not available anymore.

castLabs is promoting its single-frame forensic watermarking solution to companies and organizations interested in taking "swift action" against content theft, as the extracted watermark can pinpoint the source of a leak within the supply chain. Furthermore, the system can provide "renewed deterrence," as potential infringers are aware that they can now be easily tracked, and "solid evidence" of content ownership to prove unauthorized uses in court.

Permalink to story.

 
This seems like a process to prevent content from leaking from a company, and not necessarily to prevent piracy. The goal of this watermark is to identify where it leaked from the chain. It serves to identify the person responsible for the leak.
 
Is it an actual watermark or meta data though?
It's probably not metadata per-se but just an ID encoded in the video's pixels, similar to QR codes but not noticeable. Much less data is needed since it's just a number (probably only a few pixels need to be modified). Similar colors can be used to obscure the watermark to the human eye, and since it's a single frame of the video it would be hard to even find the watermark. With just an ID stored in the image, there would need to be a retrieval from server-side persistent data to what data that ID maps to (as mentioned in the article).

I'm sure the idea behind the article has long existed. I've previously thought of mass emails having an unnoticeable footprint with just basic modifications of text for each recipient to determine who leaked some company's internal email to the press. Only 10 different possible places in the text would need to use varying characters for this to work for 1000 recipients. A huge number of possible system generated modifications could be used to store data without anyone being aware, for instance:
- Different types of bullet points (◦ vs ∙ vs • vs ●)
- Curly quotes vs normal (I'm vs I’m / ‘ ’ vs “ ” vs ' vs ")
- Varying whitespace (ie. two characters vs one, and non-breaking spaces)
- Optional hyphens included at different points
- Optional commas
- Ellipsis character vs periods (highlight to see 3 chars ... vs 1 char …)
- Varying forms of "and" (ie. this & that vs this + that vs this/that vs this and that)
- Semi-colon vs period sentence separators
- Square brackets vs parentheses
- Number modifiers (ie. ~30 vs about 30 / 30+ vs >30 vs over 30 / 30 vs thirty)
The list would go on. You could even have control characters common to everyone to determine if the press altered the email before releasing it, or redundancy. For someone to notice this, they'd have to compare multiple employees emails which is unlikely.

And of course, there's always been this to identify the source of printed materials: https://en.wikipedia.org/wiki/Machine_Identification_Code
 
This technology has been around for decades, peaking around the early 2000s with all the interest in DRM and protecting content from piracy (remember Napster?) There were a great many watermarking schemes published, each trying to outdo the other in terms of robustness against removal, invisibility to the eye, and other "good" properties. However, DRM has largely waned into the background as subscription and streaming services became the new way for the masses to consume their content. What's different about this newest but puzzling entrant into the old fray?
 
Typically video watermarking accumulates several frames to read the ID, I.e. it needs a video clip, but sometimes you'd like to identify an origin of a stolen picture. This single-frame watermark technology is able to do that. This is in particular interesting for highly sensetive content e.g. during its production.
 
Oh YAY, another form of crap DRM that violates citizen rights to protect the publishers from the nearly non-existent pirates... Nit-wits..
 
Back