Have you thought about doing adblock for podcasts? One couldn't adblock unique reads of ads, but many podcasters read an advertisement once and then include that read in multiple podcast episodes. And there are tons of ads that aren't read by the podcaster and include in multiple episodes.
I figure any block of audio longer than a few seconds that appears in more than one podcast episode could be snipped.
I think it would have to be desktop software since if it were a service and edited the podcast files then it would be copyright infringement. I guess it could be a podcast player and not actually distribute the edited files but just the timestamps to skip ads.
Personally I'd like software I can run over a directory of mp3s that would remove duplicate sections. Any thoughts on feasibility? I'm surprised it hasn't been done yet.
Neat but static ads are slowly becoming a thing of the past... Companies like Acast and Midroll are building dynamic ad solutions that inject dynamic ads into podcasts at the time of stream/download
Sounds like that's potentially solvable by breaking the podcast down into chunks by speaking voice, then flagging any sections of ~30s with a different speaking voice from the rest. Detecting guest speakers shouldn't get caught by this as there'd be more conversation rather than a mostly unbroken 30s chunk.
Cool concept. Do you silence/skip the sound when you don't recognize a song?
Nice idea for Spotify for free without ads - piping the sound into a virtual device that gets silenced when ads play.
Edit: I don't want to support the notion that we should avoid 10$/month for such a great service, I was just curious about the technical implementation.
As a reason for not releasing it? If so, that's not a terribly good reason. I mean, clearly they already do that -- this simply permits those with lesser means to employ the technology. In general, I'm not fond of "but someone could misuse it" as a reason for not releasing a technology -- especially if it already exists in another form with limited accessibility.
Theres a whole bunch of crap Android phones do now in thr background like the "Ok/Hey Google" assistant voice shortcuts. I turn it all off. But supposedly it only uses a special lower power chip to do these passive listening actions. Not sure about music recognization though -- seems like it would involve a decent amount of memory even if performing a convolution..depends how much buffer time.
A new one for me the other day was "Google Nearby". Enabled by default and some company in the airport using it to push ads to your notifications. Disgusting and maybe the final nail in the coffin for Android for me. As a long time diehard android user, iPhone sounds better and better every day.
I've got a Pixel, won't be getting Pixel 2. My partner just got a OnePlus 5T though, and it looks very good. Definitely considering OnePlus as my next step.
It was either BWI or LAX and it said "Your ad here" with some url. I had no idea what it was so I long pressed on the notification and did some searching
Very cool stuff! It seems that all those solutions are based on the analysis of visual representations of spectrograms. Is this common or could you just use 2d arrays which encode the same information - would this be more performant?
I wrote up some of my experiments attempting to do what you are describing. I explain why you cant simply use a 2D array of an audiofile. You can find my post here:
I am by no means an expert in this area and a few people have since told me I did a few stupid things in my analysis. But you might find it interesting.
In this context, what's the difference between 'visual representations of spectrograms' and '2d arrays which encode the same information'? Algorithms don't have eyes. The way they 'see' is by reading '2d arrays'.
You mean 2d arrays containing the raw audio signal? No, this would not work because you do not know the phase along the y dimension when you want to compare to another signal.
Another method to detect an audio pattern is cross correlation on the raw audio signal. But it is very expensive in computation power and memory.
The longest operation with fingerprinting is often the DB query that is associated. Lots of work to do there. In that space, Will Drevo's work is really good. I will share my DB implementation later.
You have special fingerprint algorithms that are suited for sound modifications like pitch https://biblio.ugent.be/publication/5754913 but it's not going to work with humming or live audio. I don't know if such a thing exists.
As for 2d array spectrogram, it is not needed in my lib (expect when plotting is activated). I only care about maxima in the spectrum of each data window. In other words, 1d spectra are enough.
Spectrograms are a convenient way to visualize the data/algorithm but are rarely part of the actual analysis.
They are already using the 2d array so to speak.
In any case a spectrogram is just a 2d array where the magnitude of each array element is mapped to a color, so its effectively the same thing.
Few if any people use visual representations of sound for analysis, except for the crazies who run spectrograms though visual deep learning networks.
Uh, are you sure of what you are writing here? Time-frequency analysis (including spectrograms) is one of the very fundamental tools for signal processing.
True, i was thinking of a spectrogram as purely a visualization of a time-series of DFTs but Matlab and other tools do not make this distinction.
I was mainly responding to the OP's distinction between analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.
> analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.
This is what I mean. I guess their tooling just outputs graphics and it's easier to work with those than the pure 2d array in numpy or something similar.
Another implementation of this algorithm can be found at [1]. It also includes several other algorithms for acoustic fingerprinting that can serve as a baseline. See [2] for a paper on one of the other implemented algorithms and a comparison.
I hacked something in an hour once, and made a program that would recognize the song that was playing and played the video clip of that song from YouTube in sync:
I don't actually mind the ads, but fast-forward is a pain if you're listening when doing other stuff at the same time (running, biking, driving, cooking, etc).
do you think this could be useful for detecting changes in songs? like if i'm listing to a big mix of songs and they don't have timestamps of when the song changes, but that is info i would like to have...
This lib is a brick in an adblock for radio broadcasts I have been developing for a while and that I am progressively open sourcing.