Show HN: Shazam-like acoustic fingerprinting of continuous audio streams

dest · on Nov 29, 2017

OP here.

This lib is a brick in an adblock for radio broadcasts I have been developing for a while and that I am progressively open sourcing.

makesthingspos · on Nov 30, 2017

Have you thought about doing adblock for podcasts? One couldn't adblock unique reads of ads, but many podcasters read an advertisement once and then include that read in multiple podcast episodes. And there are tons of ads that aren't read by the podcaster and include in multiple episodes.

I figure any block of audio longer than a few seconds that appears in more than one podcast episode could be snipped.

I think it would have to be desktop software since if it were a service and edited the podcast files then it would be copyright infringement. I guess it could be a podcast player and not actually distribute the edited files but just the timestamps to skip ads.

Personally I'd like software I can run over a directory of mp3s that would remove duplicate sections. Any thoughts on feasibility? I'm surprised it hasn't been done yet.

hakanito · on Nov 30, 2017

Neat but static ads are slowly becoming a thing of the past... Companies like Acast and Midroll are building dynamic ad solutions that inject dynamic ads into podcasts at the time of stream/download

fredley · on Nov 30, 2017

Sounds like that's potentially solvable by breaking the podcast down into chunks by speaking voice, then flagging any sections of ~30s with a different speaking voice from the rest. Detecting guest speakers shouldn't get caught by this as there'd be more conversation rather than a mostly unbroken 30s chunk.

justinjlynn · on Nov 30, 2017

Speaker recognition would be really awesome. Do you know of any techniques more general than fingerprinting specific audio sections?

Edit: https://github.com/ppwwyyxx/speaker-recognition/ looks like the first of a number of good starting points.

dest · on Nov 30, 2017

I will publish more on that topic

justinjlynn · on Dec 1, 2017

sounds great! looking forward to it.

throwmenow_0140 · on Nov 29, 2017

Cool concept. Do you silence/skip the sound when you don't recognize a song? Nice idea for Spotify for free without ads - piping the sound into a virtual device that gets silenced when ads play.

Edit: I don't want to support the notion that we should avoid 10$/month for such a great service, I was just curious about the technical implementation.

dest · on Nov 29, 2017

There's speech and original musical content, like bootlegs, mixes, that cannot be detected with fingerprinting.

For a free Spotify without ads, have a look at http://www.stationripper.com/ (it's old software)

Edit: Station Ripper had caused a law/politics debate in 2005 in France about the right to do private copies of legal media http://www.assemblee-nationale.fr/12/amendements/1206/120600...

Cenk · on Nov 30, 2017

For a free Spotify without ads, open this link in Chrome: http://play.spotify.com

dest · on Nov 30, 2017

Do you need an adblock to not have ads on play.spotify.com? or is it adfree by default?

KitDuncan · on Nov 30, 2017

You need an adblocker.

Cenk · on Nov 30, 2017

What I meant was that there were no audio ads between the songs.

retox · on Nov 30, 2017

Did you consider that this could be used by record labels to detect unauthorized use of copyrighted music, etc?

justinjlynn · on Nov 30, 2017

As a reason for not releasing it? If so, that's not a terribly good reason. I mean, clearly they already do that -- this simply permits those with lesser means to employ the technology. In general, I'm not fond of "but someone could misuse it" as a reason for not releasing a technology -- especially if it already exists in another form with limited accessibility.

disappearance · on Nov 30, 2017

Like this: http://www.dubset.com/mixscan/#intro-2

My understanding is that Apple & Spotify have signed up to these guys with a view to correct payments for artists with user uploaded mixes [1].

[1] http://variety.com/2016/digital/news/spotify-apple-music-rem...

hammock · on Nov 29, 2017

Google Pixel 2 phones are doing this now as an out-of-the-box feature. It's continuously listening and the song name appears on your lock screen.

https://venturebeat.com/2017/10/19/how-googles-pixel-2-now-p...

dest · on Nov 29, 2017

I wonder how much battery it drains.

drampelt · on Nov 29, 2017

Looks like they've optimized it fairly well to prevent battery drain: https://www.xda-developers.com/how-google-pixel-2-now-playin...

Personally I've only ever seen "Pixel Ambient Services" show up on my battery list once, and that was 1% usage after a fairly long day out.

goldenkey · on Nov 29, 2017

Theres a whole bunch of crap Android phones do now in thr background like the "Ok/Hey Google" assistant voice shortcuts. I turn it all off. But supposedly it only uses a special lower power chip to do these passive listening actions. Not sure about music recognization though -- seems like it would involve a decent amount of memory even if performing a convolution..depends how much buffer time.

0x00000000 · on Nov 30, 2017

A new one for me the other day was "Google Nearby". Enabled by default and some company in the airport using it to push ads to your notifications. Disgusting and maybe the final nail in the coffin for Android for me. As a long time diehard android user, iPhone sounds better and better every day.

finnn · on Nov 30, 2017

In case anyone's wondering more about how that store did that, here's the docs: https://developers.google.com/beacons/

KitDuncan · on Nov 30, 2017

Why not just use a rom without all the bloatware and maybe even without google services alltogether?

j_s · on Nov 30, 2017

https://lineage.microg.org

discussion: https://news.ycombinator.com/item?id=15619416

supported devices: https://wiki.lineageos.org/devices

0x00000000 · on Nov 30, 2017

My carrier forces all manufacturers to lock the bootloader. Also doesn't stuff like Google pay and netflix not work?

fredley · on Nov 30, 2017

I've got a Pixel, won't be getting Pixel 2. My partner just got a OnePlus 5T though, and it looks very good. Definitely considering OnePlus as my next step.

hammock · on Nov 30, 2017

OnePlus have always looked v good. But they don't work on Verizon so that's a non starter.

dmitrygr · on Nov 30, 2017

Could you please tell me which airport this was and if you remember the location and the ad?

Thanks a lot

0x00000000 · on Nov 30, 2017

It was either BWI or LAX and it said "Your ad here" with some url. I had no idea what it was so I long pressed on the notification and did some searching

hammock · on Nov 30, 2017

Google Nearby is everywhere, I feel like I've seen it walking past Targets and a bunch of other places lately. Even small businesses

throwmenow_0140 · on Nov 29, 2017

Very cool stuff! It seems that all those solutions are based on the analysis of visual representations of spectrograms. Is this common or could you just use 2d arrays which encode the same information - would this be more performant?

Nice blog post about this stuff: http://willdrevo.com/fingerprinting-and-audio-recognition-wi... - https://github.com/worldveil/dejavu

doctoboggan · on Nov 29, 2017

I wrote up some of my experiments attempting to do what you are describing. I explain why you cant simply use a 2D array of an audiofile. You can find my post here:

http://jack.minardi.org/software/computational-synesthesia/

You can also see the code behind it here:

https://github.com/jminardi/audio_fingerprinting

I am by no means an expert in this area and a few people have since told me I did a few stupid things in my analysis. But you might find it interesting.

rahimnathwani · on Nov 30, 2017

In this context, what's the difference between 'visual representations of spectrograms' and '2d arrays which encode the same information'? Algorithms don't have eyes. The way they 'see' is by reading '2d arrays'.

dest · on Nov 29, 2017

You mean 2d arrays containing the raw audio signal? No, this would not work because you do not know the phase along the y dimension when you want to compare to another signal.

Another method to detect an audio pattern is cross correlation on the raw audio signal. But it is very expensive in computation power and memory.

The longest operation with fingerprinting is often the DB query that is associated. Lots of work to do there. In that space, Will Drevo's work is really good. I will share my DB implementation later.

throwmenow_0140 · on Nov 29, 2017

I meant the spectrogram encoded as a 2d array, but I guess there isn't a big difference when the db query is the most expensive part.

I've always wondered: Is there a way to compare fingerprints with humming sounds or live recordings?

Those fingerprinting techniques don't seem to be suitable for those tasks, do you know of any methods to accomplish this?

dest · on Nov 29, 2017

You have special fingerprint algorithms that are suited for sound modifications like pitch https://biblio.ugent.be/publication/5754913 but it's not going to work with humming or live audio. I don't know if such a thing exists.

If you want to do some research, here is a short review paper on the topic http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_IS...

As for 2d array spectrogram, it is not needed in my lib (expect when plotting is activated). I only care about maxima in the spectrum of each data window. In other words, 1d spectra are enough.

ssalazar · on Nov 29, 2017

Spectrograms are a convenient way to visualize the data/algorithm but are rarely part of the actual analysis. They are already using the 2d array so to speak. In any case a spectrogram is just a 2d array where the magnitude of each array element is mapped to a color, so its effectively the same thing. Few if any people use visual representations of sound for analysis, except for the crazies who run spectrograms though visual deep learning networks.

dest · on Nov 29, 2017

Uh, are you sure of what you are writing here? Time-frequency analysis (including spectrograms) is one of the very fundamental tools for signal processing.

ssalazar · on Nov 29, 2017

True, i was thinking of a spectrogram as purely a visualization of a time-series of DFTs but Matlab and other tools do not make this distinction.

I was mainly responding to the OP's distinction between analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.

throwmenow_0140 · on Nov 30, 2017

> analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.

This is what I mean. I guess their tooling just outputs graphics and it's easier to work with those than the pure 2d array in numpy or something similar.

jmmcd · on Nov 30, 2017

No, the graphics are only being used as part of the explanation. The algorithm is not working with them.

joren- · on Nov 30, 2017

Another implementation of this algorithm can be found at [1]. It also includes several other algorithms for acoustic fingerprinting that can serve as a baseline. See [2] for a paper on one of the other implemented algorithms and a comparison.

[1] https://github.com/JorenSix/Panako

[2] http://www.terasoft.com.tw/conf/ismir2014/proceedings/T048_1...

dest · on Nov 30, 2017

Thank you for having released Panako. Note that I gave the link to the relevant paper in a previous comment

https://news.ycombinator.com/item?id=15811221

joren- · on Nov 30, 2017

Ah, I did not see that. Good to know that it is findable.

dest · on Nov 30, 2017

I did not actually know that there was a Github for this, I only had the paper.

stavros · on Nov 29, 2017

I hacked something in an hour once, and made a program that would recognize the song that was playing and played the video clip of that song from YouTube in sync:

https://www.youtube.com/watch?v=K6FxfZH_ZK4

The phone in that video is just playing a song, it doesn't have any connection to the computer at all.

dest · on Nov 29, 2017

Nice. How did you recognize the song? Cross correlation or fingerprinting? How big was your song database?

stavros · on Nov 29, 2017

Unfortunately I didn't write my own code for that, I just used a pre-existing fingerprinting API.

uitgewis · on Nov 29, 2017

Not to discredit, but that's a lot less significant.

stavros · on Nov 29, 2017

Yes, hence the "hacked together in an hour" part.

sunsetMurk · on Nov 29, 2017

you have public repo for this? awesome stuff.

stavros · on Nov 29, 2017

It was a really dirty 40 lines of code, so I don't have it publicly anywhere, but I can upload it somewhere when I get home if you want.

maneesh · on Nov 29, 2017

Me want!! Please do upload it!

stavros · on Nov 29, 2017

Here you go: https://www.pastery.net/jhsvrs/

dest · on Nov 29, 2017

Thanks. For the record, the fingerprint service used in this script is ACRCloud https://www.acrcloud.com

stavros · on Nov 29, 2017

Indeed, and you need this repo:

https://github.com/acrcloud/acrcloud_sdk_python/

peterburkimsher · on Nov 30, 2017

That's great! I was just thinking about rewriting Shazam as a machine learning project.

I'm wondering how to use my Chord Progression data to make a different audio fingerprinting algorithm.

https://peterburk.github.io/chordProgressions/index.html

Xeoncross · on Nov 29, 2017

Thanks for sharing. Processing PCM audio signals is something that is actually useful for more things that people realize.

dest · on Nov 29, 2017

Hope it will be useful!

This lib is a brick in an adblock for radio broadcasts I have been developing for a while and that I am progressively open sourcing.

vitovito · on Nov 29, 2017

How closely can it correlate audio broadcasts of the same audio that were captured at different offsets?

e.g. two independent streams, identifying the same 30-second commercial, but the audio streams are offset from each other by half a sample length?

dest · on Nov 29, 2017

It correlates quite well.

Maybe some fingerprints will be present in only one of the two streams, but most of them will be present in both.

snissn · on Nov 29, 2017

Have you considered building a podcast player app that can automatically skip ads?

dest · on Nov 29, 2017

Yes, I more or less have, but isn't the fast-forward option in podcasts players good enough?

BTW a few months ago, I talked to an Australian dev that did podblocker.com, but the project does not seem active anymore

https://news.ycombinator.com/item?id=13799700

icebraining · on Nov 30, 2017

I don't actually mind the ads, but fast-forward is a pain if you're listening when doing other stuff at the same time (running, biking, driving, cooking, etc).

slig · on Nov 29, 2017

Care to share more details of how it works? Thanks!

dest · on Nov 29, 2017

This, I will publish at a later time ;)

toomuchtodo · on Dec 1, 2017

Thanks so much for your work on this. Interested in running it against the Internet Archive’s audio collection.

durkie · on Nov 30, 2017

do you think this could be useful for detecting changes in songs? like if i'm listing to a big mix of songs and they don't have timestamps of when the song changes, but that is info i would like to have...

dest · on Nov 30, 2017

Yes it could be. You need a song database to detect changes, and that is hard and/or expensive to gather.

Commercial services are available in that field. ACRCloud was mentioned in another comment.

maephisto · on Nov 29, 2017

Awesome share!

dest · on Nov 29, 2017

thank you!

megamindbrian2 · on Nov 29, 2017

Can it fingerprint other streams?

dest · on Nov 29, 2017

You mean audio streams? Of course. Just change the URL next to curl and that's it.

ww520 · on Nov 29, 2017

Isn't Shazam patented?

dest · on Nov 29, 2017

Maybe, but I don't know.

I'm in France and this lib is software only, so probably Shazam patents are not enforcable here.

Anyway, IANAL and cheers to Shazam people

disappearance · on Nov 30, 2017

If it is, it hasn't stopped DubSet [1] from licensing some tech to Apple and Spotify [2].

[1] http://www.dubset.com/mixscan/#intro-2 [2] http://variety.com/2016/digital/news/spotify-apple-music-rem...