I think the problem here is the same one as the other current music generation s...

Barneyhill · on April 10, 2024

Yep, I came to similar conclusions w/ text-to-audio models - in terms of creative work the ability to iterate is really lacking with the current interfaces. We've stopped working on text-to-audio models and are instead focusing on targeting a lower-level of abstraction by directly exposing an Ableton environment to LLM agents.

We just published a blog today discussing this - https://montyanderson.net/writing/synthesis

zaptrem · on April 10, 2024

Our variations feature coming very soon is exactly this! Rhythm Control is an early version of this.

dwallin · on April 10, 2024

I'll keep an eye out for that! The variations feature in Suno is a good example of what not to do here, as it effectively just makes another random iteration using existing settings.

I think the other missing pieces I've found are upscaling and stem splitting. While existing tool exist for splitting stems exist, my testing found that this didn't work well in practice (at least on Suno music), likely due to a combination of encoder-specific artifacts and the overall low sound quality. Existing upscaling approaches also faced similar issues.

My naive guess is that these are things that will benefit from being closely intertwined with the generation process. Eg when splitting up stems, you can use the diffusion model(s) to help jointly converge individual stems into reasonable standalone tracks.

I'm excited about the potential of these tools. I've definitely personally found uses cases for small independent game projects where a paying for musicians is far out of budget, and the style of music is not one I can execute on my own. But I'm not willing to sacrifice on quality of results to do so.

zaptrem · on April 10, 2024

Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics). Since we use a diffusion model, we can actually restart the generation process from an early timestep (e.g., with a similar seed or even parts of the existing song) to get exactly what you're looking for.

throwup238 · on April 10, 2024

> Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics).

That's their "Remix" feature which just got renamed "Reuse prompt" or something.

Their extend feature generates a new song starting from an arbitrary timestamp, with a new prompt. It doesn't always work for drastic style changes and it can be a bit repetitive with some songs but it doesn't completely reroll the entire song.

SubiculumCode · on April 10, 2024

I uploaded a bit of a song that I recorded once (that I wrote, unpublished), and I am trying to get it to riff on it, generate something close to it, etc.

SubiculumCode · on April 10, 2024

More strength does what? More or less similar?

zaptrem · on April 10, 2024

More strength = force rhythm more. If you crank it to max it will probably result in just a drum line, so I prefer 3-4.

nomel · on April 11, 2024

Same with text models, for me. If I can't edit my query and the AI response, to retry/keep the context in check, then I have trouble finding use for it, in creation. I need to be able to directly influence the entire loop, and, most importantly, keep the context for the next token prediction clean and short.

skybrian · on April 11, 2024

Letting you edit the response is quite easy to do, technically speaking. It's not done in the default UI for most AI Chatbots, unfortunately. You will need to look for alternative UIs.

ljm · on April 11, 2024

I've noticed that the output tends to suffer when you pass in longer lyrics, too. Lots of my experiments start off fairly strong but then it's like it starts to forget, and the lyrics lose any rhythmic structure or just becomes incoherent.

At some point it's just not efficient to try and get the desired output purely through a prompt, and it would be helpful to download the output in a format you can plug into your DAW to tweak.

p1esk · on April 10, 2024

But that’s not a problem when listening to Spotify? Why can’t we treat these music generation engines the same way we treat music streaming services?

darby_eight · on April 11, 2024

Idk what you're referring to specifically, but music discovery services are terrible across all of spotify, apple music, google music, tidal, etc. I don't expect these services to read your mind, but they also don't ask for many parameters to help with the search. Definitely a huge opportunity here for innovative new services.

zaptrem · on April 11, 2024

TikTok can tolerate a lot more active skipping than Spotify can before they annoy their users. We’d love to solve this. How would you? Maybe we could let users write why they didn’t like the song in natural language since we understand that now.

ctrw · on April 10, 2024

Basically you need something like comfy UI for music.

Variation in small details is fine, but you need control over larger scale structure.