I would like something like this, but to cut out all non-music (i.e. talking, interviews, etc). Would tensorflow be able to be trained to detect just speech vs music? It would obviously fail if there is speech with background music, but that isn't too common outside of Jamaica (and reggae stations/shows)