The audio is 44.1khz stereo, but all of us use autoencoders so the songs will fit in a transformer's context window, and huge compression will affect quality. We're definitely working on better ones, though!
I've found that adding prompt elements such as "hi-fi", "sharp imaging" and "clear soundstage" have helped create a less compressed and generally cleaner sound.
If this were the future that would be kinda depressing. I think the best, truly catchy songs and those that truly connect with people will continue having a significant human element. I see this as similar to the invention of Photoshop except even easier for normal people to start getting into.
At least for hip hop, AI is too sanitized to do anything too creative.
I suspect record labels might train their own models. I know for sampling, being able to just create a royalty loop without worrying about clearing anything is cool.
Suno has this issue too, but everything sounds like it's washed out or something. As if you recorded it from a different room.
Still I love this, ultimately I think it'll be a tool musicians use vs something for creating stand alone art