I train for 1M steps (batch size 64, block size 2048), which is enough for the m...

		vvolhejn 42 days ago \| parent \| context \| favorite \| on: Neural audio codecs: how to get audio into LLMs I train for 1M steps (batch size 64, block size 2048), which is enough for the model to more-or-less converge. It's also a tiny model for LLM standards, with 150M parameters. The goal wasn't really to reach state of the art but to show how the performance of a single language model architecture can be vastly different when you just change the tokenizer.

To get around state of the art, how many parameters would be needed with your approach?