I agree, and it's frustrating that there's so much fixation on this "single text prompt to [other thing]" use case in how people are building these things out. I think that drives a lot of the "slop" feel of these things, because the target consumer of the tools isn't someone who wants to engage with an artistic process to create something, which to me is a process of refinement and a feedback loop with one's tools, no matter what those tools are
I think this might be a good research paper proof of concept for a model, and a lack of explanation of how it works is disappointing but expected. I think as a product, the target audience for this thing isn't people who want to make art, but people who like the idea of generative AI per se. Maybe it'll go more toward being a tool artists can use in the future, but I don't think that's what gets you funded in this environment, and it seems much harder to make things that work that way. The coolest uses of and tooling for generative image models have been created by the open-source communities around them, and I think the same will be true of audio
> a process of refinement and a feedback loop with one's tools...
Yes! While technically impressive, these "text prompt to finished song" AI tools currently only solve low-value problems for already over-saturated markets. I just don't see a good path to a real business from "finished song" as the use case.
* With Spotify, Soundcloud, etc music consumers already have access to more new, human-created songs than they can possibly listen to - all at historically low cost.
* Buyers of custom created music such as video makers and game studios already have more stock music library choices and custom creation options (from Fivr etc) than ever - also at historically low costs.
These are already low-value, commoditized markets and, once the novelty wears off, can't generate VC-level returns. And, no, I don't think AI is going to take a meaningful part of the high-end music market from the likes of Taylor Swift. It's not that I doubt AI will eventually make music that good - it's that high-earning pop stars like Taylor Swift, Beyonce, etc are much more than their songs. They are global brand businesses that generate more revenue from touring, merch and product tie-ins than the music itself.
However, there is a potentially profitable market for AI music tools that no one's targeting yet. It's a smaller market but it's accessible, scalable and immediately viable for even a beta-level, "research-to-product" solution. Don't generate finished songs. Instead, make an interactive tool which collaborates with human music makers in a much more granular way by generating the elements and components of music (called stems) as well as the underlying MIDI data. There's a whole industry selling human-created element libraries consisting of stems, loops, backing tracks, samples and style-based construction kits. These are used in a lot of the human-created music we hear. But they aren't interactive, adaptive or collaborative.
AI can provide a superior solution right now and it doesn't even need to be 'top human' quality to be useful. Pop stars like Taylor Swift etc can afford to hire the best, proven, human-producers, studio musicians and mixing engineers to collaborate with but there's a significant market of people, from students and hobbyists to indie producers and semi-pro musicians who can't afford human collaborators.
To me this looks like a pretty rare thing in AI: A classic "Two Pizza"-type startup opportunity where a modest seed round can get to product-market fit and real cash flow. You also won't have to out-market Taylor Swift, outspend FAANG or target fickle consumers.
I'm just a long-time music making hobbyist and I consistently spend several hundred dollars a year buying such libraries, stems, loops and samples. It's far more than I pay for all my subscriptions to 'finished music' combined. And I have no aspirations to make money with my music. Hell, no one outside family and a few friends ever even hear it. Making music is just an extremely enjoyable creative activity I like to spend time (and money) on. But, as a potential customer, I have no use for a tool that generates finished songs. However, an AI that takes text prompts along with some midi chords and musical phrases I provide and then generates a variety of suggestions in the form of separate stem tracks with MIDI which I can further mix and modify would be an 'instant buy' for me. It doesn't need to be as good as a human collaborator because it's better in other ways: always available, non-judgemental, infinitely patient and yet has no opinions or emotional needs of its own.
I think this might be a good research paper proof of concept for a model, and a lack of explanation of how it works is disappointing but expected. I think as a product, the target audience for this thing isn't people who want to make art, but people who like the idea of generative AI per se. Maybe it'll go more toward being a tool artists can use in the future, but I don't think that's what gets you funded in this environment, and it seems much harder to make things that work that way. The coolest uses of and tooling for generative image models have been created by the open-source communities around them, and I think the same will be true of audio