*Vizio collected a selection of pixels on the screen that it matched to a databa...

mikeryan · on Feb 7, 2017

The terms of art are Automatic Content Recognition https://en.wikipedia.org/wiki/Automatic_content_recognition and Video Fingerprinting https://en.wikipedia.org/wiki/Digital_video_fingerprinting

You can dive in from there but it's basically either watermarking or fingerprinting of video and or audio frames. Video was preferred because there were fewer false positives from music beds. In a nutshell its video Shazam

spangry · on Feb 7, 2017

Perceptual hashing could have been the technique used. For example, see: https://github.com/JohannesBuchner/imagehash

I use this for deduplicating my, er, 'home movies' collection.

EDIT: Here's a good explanation of one specific technique that should give you the general idea: http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-I...

cracell · on Feb 7, 2017

I'm also curious if they'd be able to match different encodings of the same video or would only be able to match against specific encodings in their collection.

nikmobi · on Feb 7, 2017

I would imagine it's simply a temporal comparison of pixel colors at predetermined locations, similar to how the Shazam algorithm[1] works? You'd just need to analyze enough pixels to reduce "collisions", coupled with the temporal aspect!

[1] https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

mikeryan · on Feb 7, 2017

It's not encoding based - its frame based, basically bitmap data. It has to be to work across the whole video delivery pipeline so it's fairly fuzzy but also accurate.

They need a source to compare too so when we worked on it masters were being sent from the network to the sync technology group. So they had source data for comparison on the first broadcast of a show.

Outside of latency there's no reason they can't match against broadcast content off cable. For user tracking they can just log the fingerprint data and compare it later to source data for analytics so this works fine.