"I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes." (C)
authorjmac
This AI stuff is overhyped and has resulted in the creation of a lot of slop and spam and is fraught with unresolved ethical issues, but AI is just computers and computers are just automation, which has been used to accelerate art pipelines for decades. It doesn't really have to be all or nothing either, I've made a few AI generated songs using Udio and Suno and all of them were my own lyrics with no generative AI assistance.
The main problem I have with generative AI tools in an artistic sense is when they lack the ability to convey specificity of intent, word prompts alone aren't good enough.
I agree, and it's frustrating that there's so much fixation on this "single text prompt to [other thing]" use case in how people are building these things out. I think that drives a lot of the "slop" feel of these things, because the target consumer of the tools isn't someone who wants to engage with an artistic process to create something, which to me is a process of refinement and a feedback loop with one's tools, no matter what those tools are
I think this might be a good research paper proof of concept for a model, and a lack of explanation of how it works is disappointing but expected. I think as a product, the target audience for this thing isn't people who want to make art, but people who like the idea of generative AI per se. Maybe it'll go more toward being a tool artists can use in the future, but I don't think that's what gets you funded in this environment, and it seems much harder to make things that work that way. The coolest uses of and tooling for generative image models have been created by the open-source communities around them, and I think the same will be true of audio
> a process of refinement and a feedback loop with one's tools...
Yes! While technically impressive, these "text prompt to finished song" AI tools currently only solve low-value problems for already over-saturated markets. I just don't see a good path to a real business from "finished song" as the use case.
* With Spotify, Soundcloud, etc music consumers already have access to more new, human-created songs than they can possibly listen to - all at historically low cost.
* Buyers of custom created music such as video makers and game studios already have more stock music library choices and custom creation options (from Fivr etc) than ever - also at historically low costs.
These are already low-value, commoditized markets and, once the novelty wears off, can't generate VC-level returns. And, no, I don't think AI is going to take a meaningful part of the high-end music market from the likes of Taylor Swift. It's not that I doubt AI will eventually make music that good - it's that high-earning pop stars like Taylor Swift, Beyonce, etc are much more than their songs. They are global brand businesses that generate more revenue from touring, merch and product tie-ins than the music itself.
However, there is a potentially profitable market for AI music tools that no one's targeting yet. It's a smaller market but it's accessible, scalable and immediately viable for even a beta-level, "research-to-product" solution. Don't generate finished songs. Instead, make an interactive tool which collaborates with human music makers in a much more granular way by generating the elements and components of music (called stems) as well as the underlying MIDI data. There's a whole industry selling human-created element libraries consisting of stems, loops, backing tracks, samples and style-based construction kits. These are used in a lot of the human-created music we hear. But they aren't interactive, adaptive or collaborative.
AI can provide a superior solution right now and it doesn't even need to be 'top human' quality to be useful. Pop stars like Taylor Swift etc can afford to hire the best, proven, human-producers, studio musicians and mixing engineers to collaborate with but there's a significant market of people, from students and hobbyists to indie producers and semi-pro musicians who can't afford human collaborators.
To me this looks like a pretty rare thing in AI: A classic "Two Pizza"-type startup opportunity where a modest seed round can get to product-market fit and real cash flow. You also won't have to out-market Taylor Swift, outspend FAANG or target fickle consumers.
I'm just a long-time music making hobbyist and I consistently spend several hundred dollars a year buying such libraries, stems, loops and samples. It's far more than I pay for all my subscriptions to 'finished music' combined. And I have no aspirations to make money with my music. Hell, no one outside family and a few friends ever even hear it. Making music is just an extremely enjoyable creative activity I like to spend time (and money) on. But, as a potential customer, I have no use for a tool that generates finished songs. However, an AI that takes text prompts along with some midi chords and musical phrases I provide and then generates a variety of suggestions in the form of separate stem tracks with MIDI which I can further mix and modify would be an 'instant buy' for me. It doesn't need to be as good as a human collaborator because it's better in other ways: always available, non-judgemental, infinitely patient and yet has no opinions or emotional needs of its own.
Image gen is streets ahead of music in terms of control, as long as you stick to the FOSS stuff as DALL-E is too limited. I’m only an observer for now and haven’t actually used it much, but both StableDiffusion and SDXL have ControlNet and a bunch of other things that let you, for example, draw a stick man in a specific pose and the AI will generate a realistic man in that pose. Or edit one specific part of the generation and continue iterating from there.
The day we get a similar level of control with AI music will be a dream come true for me. We really need stems or at least MIDI files for these tools to be more than just soulless jingle generators imo.
I've been using Krita with the Stable Diffusion plugin, it's pretty amazing to use at times. I often read critics say things like 'you can't do layers with generative AI' and, uh, nuh? Though you can't, say, generate a shadow with adjustable alpha transparency, this doesn't seem like something that's impossible to do with the technology eventually. To think the tools won't improve would've been like looking at MacPaint and saying that digital art will never be a thing because it's always going to be low resolution and monochrome.
What I'd love is Suno/Udio as a VST plugin. Being able to supply MIDI or audio samples to pull melodies from, to generate from arbitrary audio on a timeline.
True, but for example for myself as an indie game developer with no musical talent or the financial resources to pay for unique music from an artist this is extremely valuable.
- Is it as good as N targeted music tracks that fit together to match my game? No.
- Is it better than something I can create myself? By far.
- Is it better than a random few open-source or cheap tracks that you can buy on any random storefront? Sometimes.
So at the very least it has a foot in the door as far as I'm concerned.
Despite these machines, a person can easily spend an hour or two every day on common household tasks like cleaning the kitchen and doing laundry. That’s time that many people would love to get back.
Same, I need the exact music I want, tailored exactly to my tastes, all the time. I don’t want to waste time listening to others perspectives or ideas.
It's a fun toy/tool. Cool to see AI progressing. As a musician who does spend hours making music by hand this has instantly widened my artistic vision by being able to drop in a few ideas to see what comes out. Handmade music will still be around.
I wrote the lyrics. The original prompt is listed on the song's page though I tweaked it during extensions, changed the RNG for the latter parts of the song, there's lots of knobs and dials to tweak now.
Weird A.I. is the one who's ironically racist didn't you listen to the first half of the song geez (I consciously avoided slurs and ended with something utterly irredeemable to underline the joke).
The joke is that, as an AI, it can't help but spout a bunch of racist nonsense, also that the song is explicitly political when none of Weird Al's songs are.
They're locked down now because by default they absolutely do generate that kind of shit. It's a riff on the early days of Twitter generative AI chatbots where blasting out right wing and neo-nazi talking points was very much a thing.
Was it a trope? Who cares, I don't think of comedy in that way. If the implication you're trying to make is that I'm racist for writing that stuff, it's obvious given the surrounding context that it's not in any way a sincere expression of those views. Also in the verse itself, the language used was pretty light and given how that verse ends, clearly not an endorsement.
The very first track I created in suno (about us landing on an alien planet) blew my socks off. The vocals in one or two places are a bit off, but overall it's amazing.
Less than three months after laughing in disbelief that it was now possible for an especially clever orchestration of AI models to instantly produce a pop song about my refrigerator, I’m almost nearly as amazed that there are now multiple entrants in this market. Feels singularity-ish.
This is a bit more complicated considering you can supply your own lyrics. The court didn't rule that non-generated media when mixed with anything AI generated invalidates the copyright of anything it's mixed with.
So given an AI generated song with fully human-written lyrics perhaps others could mute the lyrics, or more easily sample from it, but the resulting output as a whole would probably have some degree of copyright protection. Suno has also demonstrated being able to supply your own melodies too, put those two together and how much of the resulting work could be credibly argued to remain uncopyrightable?
It would be so much more useful for music producers if these audio gen services would create individual samples instead of trying to generate the entire composition.
Agreed! Creating a high-quality finished song is not only a harder target for a product to hit, it's actually less valuable as a use-case. I wrote a response to another post detailing why as well as the product I'd actually pay for. https://news.ycombinator.com/item?id=40563993
Should be useful even though it's not fully what I'd hoped for (provide note/chord MIDI input, more granular control with repeatability, MIDI data out) but it's definitely a big step in the right direction.
The length of the output is not the issue. The problem is that the output includes drums, voice, and several instruments already premixed into an accompaniment. What music producers really need are isolated samples of just a single instrument that can be mixed, layered and rearranged using other tools.
There's other AI tools that can split the audio into stems, though it's not a perfect solution since it introduces artifacts. Good enough to lightly remix and adjust levels, but taking whole instruments or vocals out usually leaves the mix sounding fairly hollow.
Udio and Suno are both really bad at generating classical instrumental music that makes any melodic sense — the equivalent of drawing six-fingered people everywhere.
Funny to see how people getting ignored when they post their AI song. Personally against the movement because it’s basically killing the music culture.
People use it to generate huge number of neuro covers on different genres for some memetic song which gets funnier with each new one. Obviously not possible without AI because who would care to record a Rockabilly or Cossack choir for a meme?
I would like an AI which can just take an existing song and make it better quality (several knobs), or put new lyrics.
I very much enjoy udio, stable audio, suno and others.
- it seems like lately most human musicians write music when they are angry or depressed, not when they are in a good mood. These tools are able to come up with a neutral or positive sounding music much better than, say, youtube music search.
- it probably marks end to cookie-cutter music production (can you really tell the difference between modern edm tracks?). letting musicians play live music to smaller audiences. because of these tools, suddenly live perfomance is special again.
- unlike people, these tools are not afraid to be silly. creating ridiculous cat music is a lot of fun
- this is a great way to get ideas for your own music. no need to sprinkle ink on note sheets, like they used to.
Funny this was posted today. I spent some time this weekend playing around with Facebook’s MusicGen and had a ton of fun. I’m planning to use it to have a personal 24/7 radio station. Wonder what Udio is using under the hood.
If you ever make it public, please email me your channel address and I could add it to next software version of my bathroom radio: https://loodio.com - my email is carl@thedomain
Listening to some of these, it is so extraordinary to think an algo created and performed the music, all I find myself wondering is how much these songs resemble a particular song it was trained on.
A lot of my prompts on Suno seem to inevitably gravitate toward a sort of contemporary pop style of production, but so far, it seems like Udio does better with older or more niche styles, such as 70s funk or show tunes. It also seems like a lot of of the trending examples I’m seeing on Udio use custom lyrics (which may or may not have started in an LLM, but appear to have human-generated phonetic spellings) as well.
Are there any models that I could run locally that would let me do this? I'm afraid I will go bankrupt buying credits for this if I'm not careful because I enjoy it so much!
Nothing this sophisticated. From my research, MusicGen from Facebook seems to be the best open model you can run yourself. But it’s not intended for generating “songs”, just short clips of music in a style you specify. Still really really cool and fun to play with though!
I know people are complaining that it's dehumanizing music, and there's some truth to that, but considering I was never in the group of people of "making music" anyway, it's immensely fun to have any song I can imagine immediately materialize (even if it is a bit soulless). I would like to have thousands of prompts queued up on my GPU server and generate everything I can think of.
Good project. Personally though, I have made a resolve to not listen to and support AI artists/musicians.(Provided I recognize the artist is using AI of course).
Is there people who really enjoy music generated by AI ?
I guess there always is an audience for poorly made things or poor quality things like industrial food with low quality ingredients.
Or maybe it’ll become so good it replaces artists altogether but what’s the point of it all ?
In a way, future generations might only know this new world where music is generated by machines and won’t be shocked ?
My take is that, music is so deeply rooted within us that even if AI can generate it, it’ll never replace the human experience and it might even push music made by humans to be a luxury and be more expensive. In a way it’s a good thing for artists if money goes in their pockets, on the other hand it might severed a part of the population who will not have access to culture anymore.
Or there might be more piracy but it might kill the artist way of living and their music in the process.
I though about more concerts and all, but as of today, I find it difficult and expensive to assist concerts from where I live. I requires many hours of travel, even taking hotels which makes the experience out of reach if required a few times a month.
My brother in law is a musician but he’s never been able to make a living out of it. They performed in places but in order to live and support his family he need a job which made it harder to live of his craft.
I’m curious to see which positive change this will bring
Personally, I enjoy the fact that it's generating songs about topics I'd find it funny to have a song about. My dog playing with their friends, a song about a funny situation that happened. So it's mostly the part about hearing something that's personal to me being put into a song. I'll listen to it a few times, send it around and be done with it. It'll never go into my daily-listening queue, will not replace the emotional connection I have with songs that helped me through bad times. It's just a fun tool to make something "personal" that I'd never ever would hire an artist for anyways.
Udio as a platform gives off this creepy dystopian vibe and I’m really not a fan. All the music is super uncanny valley - no idea who is actually listening to it.
That said, I think there’s a great use for AI generated music in background noise. I’ve been playing with Facebook’s MusicGen and it’s really fun. I’m working on making a personal 24/7 radio station based on whatever prompt I want. It’s a far shot from actual human-created music from a melodic standpoint, but if I just want an infinite stream of noise while I work or read, I think it’ll be good enough.
I'm legitimately surprised that there seems to be no pornographic equivalent to this yet (or, at least, not one I've heard about in passing). Porn-on-demand seems like it's always a prime target for AI tech, maybe because the demand is automatically there, maybe because it doesn't really matter if it's poorly made as long as the gist is there. Maybe audio porn is just too niche?
I doubt it is. Have been seeing several 'tok videos of "oldies" AI music and the album art featured in the videos looks like this. I think they are spending marketing dollars on flooding the typical social media distribution channels with content from their platform.