Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Bad training data. Lorem Ipsum is the de facto placeholder text for so many webpages.


I work with machine translation - such artifacts would be a natural result from multilingual webpages that aren't yet fully translated. An article would have correct text in English, Chinese or Spanish but the "translation" in some other language could have some Lorem Ipsum left there.

Statistical machine translation systems would easily pick that up, as crawling 'multilingual' sites with same/similar content is a major source for machine translation training data.


I remembered a discussion about this before, once in 2010 and once in 2013. Already in 2010 Lorum Ipsum was translated to random words that are very prevalent on the internet: "hello world", "learn more" and "free on": http://www.xefer.com/2010/10/lorem-ipsum

Therefor I don't think that Google Translate is used by spies to communicate plans about China. Even if all translated words were insidious, then still Occam's Razor tells us it is unlikely that a public translating service is used as a modern-day number station.

The 2013 discussion had translations like: "Cisco Security" and "Corporate Japan": http://googlesystem.blogspot.com/2013/06/lorem-ipsum-google-...

That's statistical machine translation for you.


That is correct. The interesting question is, why did it translate to 'China', rather than something more banal?


It could just be a selection bias, in that we think it's interesting so it makes the news. If it had translated to something more banal, we probably wouldn't be discussing it on Hacker News.


Agreed. It's the presence of China, NATO, Russia, and The Company all under one cryptic roof that makes it weird.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: