OP HERE! Solved the hugging to death issue. Reached the Spotify 11k playlist limit. Deleting old playlists and the site should be working as intended now.
THANK YOU whoever posted this!
Currently working on:
Fixing the recommendation scoring function. Right now it's giving hit or miss responses. I think the problem is with my cross encoder "reranker" is not doing its job the right way. I'll fix the passages it looks at when re ranking based on the query.
Also getting rid of the input box animation, lol. I've gotten flack for that on Reddit too. You should have seen the old site. It still renders that HTML on mobile.
It's still hugged-to-death so I can't see it, but based on responses, this looks extremely cool!
I know that monetization is a sticky topic for many people, so please don't be offended if this isn't something you're looking for, but I have a bunch of contacts in a major music streaming service's recommendation and personalization teams - if you'd like an intro to discuss this with them (either selling the idea/implementation, or using it as a resume-item for a job), drop me an email (in my about box).
I’d love to know at a high level how you went about implementing this. Is it just using OpenAI’s built-in music knowledge? Did you do any of your own classification?
TLDR: Lots of musical metadata converted into paragraphs and the SentenceTransformers Retrieve & Re-Rank Pipeline.(https://www.sbert.net/index.html)
The sentence embeddings are calculated using a Bidirectional Encoder Representation Transformer (BERT) model. There's a pre-trained model for this network trained on over 1 billion sentences from the internet that is publicly available, (thanks Microsoft) . The model transforms your description into a 784-long list of numbers (a vector) that represents the contextual meaning of your sentence.
The model runs off a dataset of musical metadata for 35,000 songs. As a "chronically online music nerd", I knew where to find it. The metadata is very rich, it has a lot of useful columns like the genres, subgenres, and descriptions of tracks. The numerical data is binned into categorical values like "obscure" mapping popularity between 0 and 10, "highly danceable" mapping danceability between 80 and 100, etc. The text data is modified into a coherent sentence: "this song's main genres are _____. this song is from the 80s. this name of this song is lovefool by the cardigans. etc"
An arduous part of the project was describing each musical genre in depth, with its own paragraph such that each genre's actual contextual meaning is captured and not just "This song is a Hyperpop song" or "This song is Adult Contemporary". It was a big exercise in music history and tested my knowledge of music. I also learned a lot about musical genres like "Mongolian Throat Singing" and how it compares to "Gamelan Throat Singing".
I also put the song lyrics for each song through GPT-3 and asked it to summarize the lyrical themes. That's also embedded and used in NLPlaylist.
Each feature for each song in our metadata dataset is now a big paragraph that describes the song. The paragraph is split up into sentences, and the embedding of each sentence is found. The final embedding for each song is then calculated by taking a weighted average over all sentence embeddings from the big paragraph and genre and lyrical embeddings.
To make your playlist, all that has to be done is compare the embedding of your query all 35,000 embeddings in the dataset and return the 100 most similar queries, using the cosine similarity distance metric. Thank god we have computers.
Once the 100 most similar candidate tracks are found, they are reranked using a "cross encoder" trained on 215M question-answer pairs from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries to give the best matches.
Incredible work, and great explanation, thank you. Could you comment more on the cost of running this (e.g. per query or per hour, or however it's set up)? Where are you running the model from?
Also, could you provide more details on the cross encoder used for reranking?
Btw, if you already have all these song embeddings, it would be very interesting to be able to pick a song, and get all the similar ones in a playlist (sort of like "Song radio" on Spotify)!
Having worked with the same tech, I'd assume this is pretty inexpensive both to train and run. Probably in the low 1000s (maybe even mid 100s?) to train. knn search on 35k entries is pretty simple. The most expensive is probably the cross encoder (both to train and to run). Would also be interested to know
I haven't tried it yet, but this is exactly how I've always wanted to have playlists generated, so I'm looking forward to trying it! Auto-generated playlists are sometimes great, but IMO they often miss the mark of what I wanted or why I like a particular song. So I think this kind of playlist generation could help solve this, especially if the playlist can be refined after initial generation.
Honestly, I'm genuinely impressed. I even searched something that I would think would trip it up, "transgirl puppycore", and it managed to not only give several songs that are extremely on-point but likewise has a bunch of new and interesting music I never would have discovered otherwise.
I genuinely can see this being an extremely useful tool for content discovery; and I really hope the major players will implement your work.
Tried to use it to extend my Funky Christmas playlist (exemplars: James Brown’s “Soulful Christmas”, The Band’s “Christmas Must Be Tonight”, any “Lean On Me” but mine is by Music Travel Love, Train’s “Christmas Island”, Pomplamoose’s take on “Here we come a-caroling”, Twin Bandit’s “Rise Up Shepherd”)... So like I am not expecting it to get the full genre correct, because the genre is Christmas songs that can simultaneously sit in the background AND surprise you, it's Pat Monahan and he's singing a Christmas song and it's about how he's gonna do Christmas in a resort in Hawaii this year because eff the snow.
Problem is that 90% of these results are not even debatably Christmas songs?
I removed all Christmas songs after Christmas! That's probably the issue. The majority of problems people are seeing can be fixed by expanding the dataset.
I asked for a playlist of "how ADHD sounded like" and I can't be more impressed.
I don't have ADHD myself but live with people that do and I think the playlist really captured some of the observed symptoms of the disorder in a musical way.
But what I really found impressive is: searching for "ADHD playlists" in Spotify yields a lot of playlists with lo-fi music for deep focus, not the fast-paced, disorganized, hyperactive songs I got in the generated playlist. That means, the model is capable of discerning what the word "ADHD" means in a profound way, it's not just regurgitating songs that resemble "Spotify playlists with ADHD in their title". Very, very impressive.
It's missing the "I feel like crap because I got yelled at for the 1000th time by the same person for doing the same thing wrong" mixed with the anxiety of "I know from experience I can't avoid that happening again, but how I can I avoid that happening again?".
The themes I can detect in the playlist itself are "I don't know quite what this is, but that itself is interesting, so my brain is just going to surf on that for a while", "nothing's going on, so I started daydreaming and my brain is in that comfortable relaxed happy place", and "oh oh oh, Yes yes yes! Mode"
Neat idea for a project, and kudos for getting everything up and running!
Tried it once or twice, and the results are very hit or miss. "70s african funk without synthesizers" includes some stuff that matches the prompt, like Ofo the Black Company, but also includes more misses than hits: Herbie Hancock, War, Prince, the Ohio Players, and Sly & The Family Stone are all American. Seems like it latched onto "funk" and really went for it there. It's fun to give the bot a more vague prompt and see where it goes though.
Ultimately, my favorite musical "discoveries" have come from friend recommendations, or editorials like the Quietus or Bandcamp Daily. There's something about the social aspect of a personal recommendation that appeals to me, even if it's not as optimized as an algorithmic one. Many people don't have that sort of relationship with music though, so I could see something like this being pretty useful.
I also got the sense that when trying a prompt like this that it latched on to one or two parts of it. For my prompt of "upbeat rock with violins and bagpipes" most of the songs fell in to the "upbeat rock" or "rock with violins" (a lot of classic rock power ballads that came up here). Only one or two songs with bagpipes.
I think there may be something about the popularity of songs and artists in the data because most are more popular and that may skew the results a bit because maybe my celtic rock bands don't fall in to the more popular categories.
But still, kudos. The goal of it was to be able to find new music in a different way and it definitely got me there.
I suspect the training was overfit to certain kinds of music. A bot that's fine tuned with "funk", "70's funk", and other similar music will produce better results.
This is great. I was a big fan of the beats music service's feature that let you 'mad lib' a sentence like "I am [singing along] [with my bff] [in the car] to [pop music]" and was always let down nothing similar appeared in competing services. This feels like the next step for that kind of feature.
For about a year now I've been eagerly awaiting when this kind of thing will become available for streaming TV and movies. This christmas it really struck me how cool and convenient it would be to be able to ask a smart tv "Hey, throw on a mix of all the good Simpsons and Bob's Burgers holiday episodes." I can't imagine that's far away now, especially seeing this project! Honestly, I wouldn't be surprised if at this point the hardest challenge would be somehow launching between different services to specific programs, not building the playlist itself.
Thank you! I'm just a guy who loves music and sharing it with people. It's been my dream to build something like that. Hope you find some good tunes with my project!
Just wanted to circle back and say some of my friends and I have been using this for our 'radio' at our desk on a daily basis and it's just awesome. The accuracy to the prompt isn't always 100% there, but I honestly like how much new music that's somewhat style-adjacent that kind of mismatch is introducing me to.
Hopefully we aren't running up a giant bill on your GPU instances...let me start paying for this every month!
So The OP said here that this is based on a “semantic” enhancing of an (already rich) 35k tracks DB.
Music platforms in general do not allow people to describe/annotate tracks verbally (people would love to "let it out" !). The day they will decide to implement an NLP search similar to this, for up to 100 million tracks [1], they will regret it.
Album reviews is the only "semantically rich" AND "widely used" music description I can think of.
Last.fm has some interesting data on the finer track level (wiki, comments, tags), but it could have been way, way richer
Tried a few softballs, not sure how accurate it is. Example: "Extremely popular pop rock from the early 1990s." [0] It's good music but I wouldn't say those were extremely popular in the 90s.
Again, great work and I'll check it out for more obscure searches based on feeling rather than objective information.
"Obscure dark techno from 1990 Berlin" yielded a playlist with a 2009 remaster of Kraftwerk. Kraftwerk is one of my favorite bands, but it's not obscure and it's not from 1990 Berlin. On the same list are a bunch of bands and songs which I would enjoy, but are not from Berlin nor from 1990.
I also tried "Cover songs which are much more famous than the originals", and after spot checking the list, it seems to be originals - not covers.
I gave the same prompts to ChatGPT. It wouldn't even try with the first prompt, but the second prompt yielded a decent list.
Neat idea–Spotify seems to be proliferating a massive amount of playlists for every conceivable mood, genre, decade, country, etc in an attempt to seemingly capture what this tool can do automatically. I think a lot of the Spotify "official" playlists are actually partially or completely algorithmically driven as well based on your personal listening history.
This is a great project! I tried with a complex prompt including lyrics quality and language and it worked perfectly! I'm really amazed.
Would it be possible to release this project on Github/Gitlab so other people can host their own version and/or hack on it? I'd love to run this on my own instance and integrate it with some my personal projects.
I'm looking for a fast paced rock song, the refrain is sung by a woman, something like "[...] every beat, [...], blue, blue sky [...] ". She had a powerful, high-pitched voice. I had this song on my liked spotify list, then it suddenly disappeared.
Can this tool find it for me?
Fun to play with, good job! I wonder if the exact opposite exist: give it a playlist, and it'll give you some kind of description for it. I always have a hard time answering "What kind of music do you listen to?", maybe AI could help.
Back in the day there was a project called Tomahawk Player that tied into a service called Echo Nest (which I believe got purchased and shut down by Spotify). For me that was the peak of music discovery. I haven't been able to replicate it yet.
AcousticBrainz and Echo Nest, both gone now. It seems like only Pandora's Music Genome is still functioning.
I know this is a Hard Problem, but I also think it is a problem with a great deal of payoff. Art (music, movies, books, etc) is unique in that someone will very much want "more of the same" but not the exact same thing (the identical song). People want the same cheeseburger over and over again, but not the same bit of music. Being able to say "more of that" and actually get the right results back, rather than "more of what other people who liked that song liked" would be a huge boon, but so far, we just don't have the analysis to automate that kind of thing.
It's also hard when it comes to identifying what features must be conserved. On Reddit, you'll get questions, say in the /r/horror subreddit, such as "I really liked May; more like this!" and I will ask, "More with Angela Bettis? More by director Lucky McKee? More about body parts? More where the lead undergoes some late stage transition from average to alluring? What, in particular, did you want more of?"
You can sort of replicate this on Discogs by looking up band members and trying to hunt down later or earlier projects from them, but again, that's just a proxy for a particular quality.
I asked for "music that would convince anyone that the cello is three best instrument" and it couldn't be further from what I'd expect. Not a single song even has a cello in it. Maybe the absence use supposed to convince?
Don't have instrument specific data on songs. Wish I did. Was thinking about classification algorithm on song spectrograms to do this. There's a music instrument dataset I could train on and run my songs through. Could possibly take weeks though. I'll investigate.
Looks like a malicious actor is messing around with it...
Hello I am testing my pentesting skills, please make a comment on your most recent reddit post and I will stop :) blah blah apple banana orange cucumber happy sad good bad filler words stars moon sun
Getting hugged too hard; great idea though! I tried a few times to get the following query through, saving it for later: "complex instrumental music like Plini or Nick Johnston with lots of melody and guitars"
To me it seems like the best matches for the prompt seem to show up towards the bottom of the playlist. Not sure if anyone else has noticed anything similar, or if it’s just coincidence.
One thing I wish you could do with Spotify is adjust the temperature of its audo generated playlists. For a certain genre I might want the playlist to rely either more or less on the well known top songs in that genre versus more esoteric choices.
Working on weighted averages for each song feature embedding. This might be the fix for that. Thinking about comparing the query embedding to sentences like "This query relates to musical genres" or "This query relates to the key and modality of songs" etc. This could be add importance to the "popularity" feature I have in my dataset for queries that contain words like "obscure" or "very popular".
Just an idea right now, I'll get to it when I can!
was reading about this today https://www.lineup.supply/ and hoping for a non-apple version ... unfortunately all i got was the Internal Server Error... will check back later
Wouldn't popular in Canada in 1995 be Nirvana/Pearl Jam/Cypress Hill/Sound Garden?
You probably mean popular in canada in 1995 and is considered a Canadian band by the Canadian mainstream press. Which should be Weird Al, Moist, Our Lady Peace.. Tragically Hip was never popchart popular until the end. Sarah McLachlan was never mainstream popular until Lilith Fare
Don't have any geographic information in my dataset that's why. Thinking about finding song metadata for artists that contain information about their geographic location. Popular in 1995 should be working though, as I do have decade and year information.
i generated a playlist and the first 3 songs were by the same band. I'm not sure how the tracks are ordered but some shuffling of the bands would be nice.
Something went wrong :-(
Something went wrong while trying to load this site; please try again later.
Debugging tips
If this is your site, and you just reloaded it, then the problem might simply be that it hasn't loaded up yet. Try refreshing this page and see if this message disappears.
If you keep getting this message, you should check your site's server and error logs for any messages.
Error code: 502-backend
There are several songs that have been named after computer terminology, some examples include:
"404" by The Notwist
"Blue Screen of Death" by The Flashbulb
"Error" by Depeche Mode
"Fatal System Error" by Fear Factory
"File Not Found" by John Foxx
"HTTP Error 503" by The Postal Service
"Kernel Panic" by The Faint
"Server Error" by The Radio Company
"System Error" by Covenant
"404 Error" by The Algorithm
I'm not actually convinced any of these songs really exist?
There is an album "File Not Found" by "Division By Zero" though.
Trying to imagine what the song "Internal Server Error" sounds like because when I typed that into Spotify it just repeated the same back to me. Then my screen bulged out with a screaming face. Then went back to normal. Weird.
Ha! Wasn’t meant to be up to par as Spotify, more as a fun project. Spotify uses React, which we will eventually implement. We didnt have time to completely replicate Spotify’s UI/UX.
THANK YOU whoever posted this!
Currently working on: Fixing the recommendation scoring function. Right now it's giving hit or miss responses. I think the problem is with my cross encoder "reranker" is not doing its job the right way. I'll fix the passages it looks at when re ranking based on the query.
Also getting rid of the input box animation, lol. I've gotten flack for that on Reddit too. You should have seen the old site. It still renders that HTML on mobile.
Taking any and all questions!