It's very related to LLMs. Though instead of text tokens you are working with au...

piecerough 3 months ago | parent | context | favorite | on: Pushing the frontiers of audio generation

It's very related to LLMs. Though instead of text tokens you are working with audio tokens (e.g. from SoundStream). Then you go to audio corpus, instead of text corpus.