Vizio collected a selection of pixels on the screen that it matched to a database of TV, movie, and commercial content.
I would like to know more about that process. I find it ethically abhorrent, but technically very interesting.
Like, is it grabbing, say, three pixels in constant locations across the screen and matching their color change over time? Is it examining a whole block? Is it averaging a block at some proportional location on the screen?
You can dive in from there but it's basically either watermarking or fingerprinting of video and or audio frames. Video was preferred because there were fewer false positives from music beds. In a nutshell its video Shazam
I'm also curious if they'd be able to match different encodings of the same video or would only be able to match against specific encodings in their collection.
I would imagine it's simply a temporal comparison of pixel colors at predetermined locations, similar to how the Shazam algorithm[1] works? You'd just need to analyze enough pixels to reduce "collisions", coupled with the temporal aspect!
It's not encoding based - its frame based, basically bitmap data. It has to be to work across the whole video delivery pipeline so it's fairly fuzzy but also accurate.
They need a source to compare too so when we worked on it masters were being sent from the network to the sync technology group. So they had source data for comparison on the first broadcast of a show.
Outside of latency there's no reason they can't match against broadcast content off cable. For user tracking they can just log the fingerprint data and compare it later to source data for analytics so this works fine.
I would like to know more about that process. I find it ethically abhorrent, but technically very interesting.
Like, is it grabbing, say, three pixels in constant locations across the screen and matching their color change over time? Is it examining a whole block? Is it averaging a block at some proportional location on the screen?