You don't need Parallel corpora for all language pairs in a "predict the next token" LLM.
What I'm saying is that if an LLM is trained on English, French and Spanish and there is Eng to French data, you don't need Eng to Spa data to get Eng to Spa translations.
How would an LLM figure out what words to translate animal sounds to? Where does it learn that information? We don't know what animals are communicating if they do have a language of sorts. There's no mapping.
Potentially the same way it knows how to translate concepts with no mappings in that language pair in the dataset. Like i said, not every language in an LLM's corpus has something in another language to map to.
If you can find a way to tokenize whalesong then you could probably build a next-token predictor but we're still missing the Rosetta stone necessary to help us map their language to ours in vector space.