Standard large transformers trained on corpora of multiple languages will genera...

Standard large transformers trained on corpora of multiple languages will generally perform next-word prediction in language A by using information that was only seen in training data in language B, demonstrating that they have managed to implicitly learn a capability for translation and/or multilingual perception.