Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Shazam-like acoustic fingerprinting of continuous audio streams (github.com/dest4)
158 points by dest on Nov 29, 2017 | hide | past | favorite | 76 comments



OP here.

This lib is a brick in an adblock for radio broadcasts I have been developing for a while and that I am progressively open sourcing.


Have you thought about doing adblock for podcasts? One couldn't adblock unique reads of ads, but many podcasters read an advertisement once and then include that read in multiple podcast episodes. And there are tons of ads that aren't read by the podcaster and include in multiple episodes.

I figure any block of audio longer than a few seconds that appears in more than one podcast episode could be snipped.

I think it would have to be desktop software since if it were a service and edited the podcast files then it would be copyright infringement. I guess it could be a podcast player and not actually distribute the edited files but just the timestamps to skip ads.

Personally I'd like software I can run over a directory of mp3s that would remove duplicate sections. Any thoughts on feasibility? I'm surprised it hasn't been done yet.


Neat but static ads are slowly becoming a thing of the past... Companies like Acast and Midroll are building dynamic ad solutions that inject dynamic ads into podcasts at the time of stream/download


Sounds like that's potentially solvable by breaking the podcast down into chunks by speaking voice, then flagging any sections of ~30s with a different speaking voice from the rest. Detecting guest speakers shouldn't get caught by this as there'd be more conversation rather than a mostly unbroken 30s chunk.


Speaker recognition would be really awesome. Do you know of any techniques more general than fingerprinting specific audio sections?

Edit: https://github.com/ppwwyyxx/speaker-recognition/ looks like the first of a number of good starting points.


I will publish more on that topic


sounds great! looking forward to it.


Cool concept. Do you silence/skip the sound when you don't recognize a song? Nice idea for Spotify for free without ads - piping the sound into a virtual device that gets silenced when ads play.

Edit: I don't want to support the notion that we should avoid 10$/month for such a great service, I was just curious about the technical implementation.


There's speech and original musical content, like bootlegs, mixes, that cannot be detected with fingerprinting.

For a free Spotify without ads, have a look at http://www.stationripper.com/ (it's old software)

Edit: Station Ripper had caused a law/politics debate in 2005 in France about the right to do private copies of legal media http://www.assemblee-nationale.fr/12/amendements/1206/120600...


For a free Spotify without ads, open this link in Chrome: http://play.spotify.com


Do you need an adblock to not have ads on play.spotify.com? or is it adfree by default?


You need an adblocker.


What I meant was that there were no audio ads between the songs.


Did you consider that this could be used by record labels to detect unauthorized use of copyrighted music, etc?


As a reason for not releasing it? If so, that's not a terribly good reason. I mean, clearly they already do that -- this simply permits those with lesser means to employ the technology. In general, I'm not fond of "but someone could misuse it" as a reason for not releasing a technology -- especially if it already exists in another form with limited accessibility.


Like this: http://www.dubset.com/mixscan/#intro-2

My understanding is that Apple & Spotify have signed up to these guys with a view to correct payments for artists with user uploaded mixes [1].

[1] http://variety.com/2016/digital/news/spotify-apple-music-rem...


Google Pixel 2 phones are doing this now as an out-of-the-box feature. It's continuously listening and the song name appears on your lock screen.

https://venturebeat.com/2017/10/19/how-googles-pixel-2-now-p...


I wonder how much battery it drains.


Looks like they've optimized it fairly well to prevent battery drain: https://www.xda-developers.com/how-google-pixel-2-now-playin...

Personally I've only ever seen "Pixel Ambient Services" show up on my battery list once, and that was 1% usage after a fairly long day out.


Theres a whole bunch of crap Android phones do now in thr background like the "Ok/Hey Google" assistant voice shortcuts. I turn it all off. But supposedly it only uses a special lower power chip to do these passive listening actions. Not sure about music recognization though -- seems like it would involve a decent amount of memory even if performing a convolution..depends how much buffer time.


A new one for me the other day was "Google Nearby". Enabled by default and some company in the airport using it to push ads to your notifications. Disgusting and maybe the final nail in the coffin for Android for me. As a long time diehard android user, iPhone sounds better and better every day.


In case anyone's wondering more about how that store did that, here's the docs: https://developers.google.com/beacons/


Why not just use a rom without all the bloatware and maybe even without google services alltogether?



My carrier forces all manufacturers to lock the bootloader. Also doesn't stuff like Google pay and netflix not work?


I've got a Pixel, won't be getting Pixel 2. My partner just got a OnePlus 5T though, and it looks very good. Definitely considering OnePlus as my next step.


OnePlus have always looked v good. But they don't work on Verizon so that's a non starter.


Could you please tell me which airport this was and if you remember the location and the ad?

Thanks a lot


It was either BWI or LAX and it said "Your ad here" with some url. I had no idea what it was so I long pressed on the notification and did some searching


Google Nearby is everywhere, I feel like I've seen it walking past Targets and a bunch of other places lately. Even small businesses


Very cool stuff! It seems that all those solutions are based on the analysis of visual representations of spectrograms. Is this common or could you just use 2d arrays which encode the same information - would this be more performant?

Nice blog post about this stuff: http://willdrevo.com/fingerprinting-and-audio-recognition-wi... - https://github.com/worldveil/dejavu


I wrote up some of my experiments attempting to do what you are describing. I explain why you cant simply use a 2D array of an audiofile. You can find my post here:

http://jack.minardi.org/software/computational-synesthesia/

You can also see the code behind it here:

https://github.com/jminardi/audio_fingerprinting

I am by no means an expert in this area and a few people have since told me I did a few stupid things in my analysis. But you might find it interesting.


In this context, what's the difference between 'visual representations of spectrograms' and '2d arrays which encode the same information'? Algorithms don't have eyes. The way they 'see' is by reading '2d arrays'.


You mean 2d arrays containing the raw audio signal? No, this would not work because you do not know the phase along the y dimension when you want to compare to another signal.

Another method to detect an audio pattern is cross correlation on the raw audio signal. But it is very expensive in computation power and memory.

The longest operation with fingerprinting is often the DB query that is associated. Lots of work to do there. In that space, Will Drevo's work is really good. I will share my DB implementation later.


I meant the spectrogram encoded as a 2d array, but I guess there isn't a big difference when the db query is the most expensive part.

I've always wondered: Is there a way to compare fingerprints with humming sounds or live recordings?

Those fingerprinting techniques don't seem to be suitable for those tasks, do you know of any methods to accomplish this?


You have special fingerprint algorithms that are suited for sound modifications like pitch https://biblio.ugent.be/publication/5754913 but it's not going to work with humming or live audio. I don't know if such a thing exists.

If you want to do some research, here is a short review paper on the topic http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_IS...

As for 2d array spectrogram, it is not needed in my lib (expect when plotting is activated). I only care about maxima in the spectrum of each data window. In other words, 1d spectra are enough.


Spectrograms are a convenient way to visualize the data/algorithm but are rarely part of the actual analysis. They are already using the 2d array so to speak. In any case a spectrogram is just a 2d array where the magnitude of each array element is mapped to a color, so its effectively the same thing. Few if any people use visual representations of sound for analysis, except for the crazies who run spectrograms though visual deep learning networks.


Uh, are you sure of what you are writing here? Time-frequency analysis (including spectrograms) is one of the very fundamental tools for signal processing.


True, i was thinking of a spectrogram as purely a visualization of a time-series of DFTs but Matlab and other tools do not make this distinction.

I was mainly responding to the OP's distinction between analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.


> analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.

This is what I mean. I guess their tooling just outputs graphics and it's easier to work with those than the pure 2d array in numpy or something similar.


No, the graphics are only being used as part of the explanation. The algorithm is not working with them.


Another implementation of this algorithm can be found at [1]. It also includes several other algorithms for acoustic fingerprinting that can serve as a baseline. See [2] for a paper on one of the other implemented algorithms and a comparison.

[1] https://github.com/JorenSix/Panako

[2] http://www.terasoft.com.tw/conf/ismir2014/proceedings/T048_1...


Thank you for having released Panako. Note that I gave the link to the relevant paper in a previous comment

https://news.ycombinator.com/item?id=15811221


Ah, I did not see that. Good to know that it is findable.


I did not actually know that there was a Github for this, I only had the paper.


I hacked something in an hour once, and made a program that would recognize the song that was playing and played the video clip of that song from YouTube in sync:

https://www.youtube.com/watch?v=K6FxfZH_ZK4

The phone in that video is just playing a song, it doesn't have any connection to the computer at all.


Nice. How did you recognize the song? Cross correlation or fingerprinting? How big was your song database?


Unfortunately I didn't write my own code for that, I just used a pre-existing fingerprinting API.


Not to discredit, but that's a lot less significant.


Yes, hence the "hacked together in an hour" part.


you have public repo for this? awesome stuff.


It was a really dirty 40 lines of code, so I don't have it publicly anywhere, but I can upload it somewhere when I get home if you want.


Me want!! Please do upload it!



Thanks. For the record, the fingerprint service used in this script is ACRCloud https://www.acrcloud.com



That's great! I was just thinking about rewriting Shazam as a machine learning project.

I'm wondering how to use my Chord Progression data to make a different audio fingerprinting algorithm.

https://peterburk.github.io/chordProgressions/index.html


Thanks for sharing. Processing PCM audio signals is something that is actually useful for more things that people realize.


Hope it will be useful!

This lib is a brick in an adblock for radio broadcasts I have been developing for a while and that I am progressively open sourcing.


How closely can it correlate audio broadcasts of the same audio that were captured at different offsets?

e.g. two independent streams, identifying the same 30-second commercial, but the audio streams are offset from each other by half a sample length?


It correlates quite well.

Maybe some fingerprints will be present in only one of the two streams, but most of them will be present in both.


Have you considered building a podcast player app that can automatically skip ads?


Yes, I more or less have, but isn't the fast-forward option in podcasts players good enough?

BTW a few months ago, I talked to an Australian dev that did podblocker.com, but the project does not seem active anymore

https://news.ycombinator.com/item?id=13799700


I don't actually mind the ads, but fast-forward is a pain if you're listening when doing other stuff at the same time (running, biking, driving, cooking, etc).


Care to share more details of how it works? Thanks!


This, I will publish at a later time ;)


Thanks so much for your work on this. Interested in running it against the Internet Archive’s audio collection.


do you think this could be useful for detecting changes in songs? like if i'm listing to a big mix of songs and they don't have timestamps of when the song changes, but that is info i would like to have...


Yes it could be. You need a song database to detect changes, and that is hard and/or expensive to gather.

Commercial services are available in that field. ACRCloud was mentioned in another comment.


Awesome share!


thank you!


Can it fingerprint other streams?


You mean audio streams? Of course. Just change the URL next to curl and that's it.


Isn't Shazam patented?


Maybe, but I don't know.

I'm in France and this lib is software only, so probably Shazam patents are not enforcable here.

Anyway, IANAL and cheers to Shazam people


If it is, it hasn't stopped DubSet [1] from licensing some tech to Apple and Spotify [2].

[1] http://www.dubset.com/mixscan/#intro-2 [2] http://variety.com/2016/digital/news/spotify-apple-music-rem...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: