Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Please use actual machine translation systems, not generative AI.




That’s some grade A nonsense.

The core algorithm behind modern generative AI was developed specifically for translation, the task which arguably these chatbots are the most suited! It’s the task that they’re far the best at, both relative to older translation algorithms (which were also AI), and relative to their capabilities other tasks that they’re being put to. These LLMs are “just” text-to-text transformers! That’s where the name comes from!

“Stop using the best electric power tool, please use the outdated steam powered tool.” is what you’re saying right now.

You’re not even asking for something to be “hand crafted”, you’re just being a luddite.


Heh, it's amazing how people have already forgotten exactly how terrible older ML translation was.

The "terribleness" is a feature. It means I can be confident that the meaning of fluent output corresponds to the meaning of the input: I'm capable of hand-translating any passages the computer can't, but I'm not capable of proof-reading all the translations to spot fluent confabulations.

LLM can translate in the style you want them to. You can make them translate more creatively or just translate word by word. I even think you can make them explain their choice of translation and help you proof-read the result.

> The core algorithm behind modern generative AI was developed specifically for translation

Indeed! And yet, generative AI systems wire it up as a lossy compression / predictive text model, which discreetly confabulates what it doesn't understand. Why not use a transformer-based model architecture actually designed for translation? I'd much rather the model take a best-guess (which might be useful, or might be nonsense, but will at least be conspicuous nonsense) than substitute a different (less-obviously nonsense) meaning entirely.

Bonus: purpose-built translation models are much smaller, can tractably be run on a CPU, and (since they require less data) can be built from corpora whose authors consented to this use. There's no compelling reason to throw an LLM at the problem, introducing multiple ethical issues and generally pissing off your audience, for a worse result.


> Why not use a transformer-based model architecture actually designed for translation?

Because translation requires a thorough understanding of the source material, essentially up to the level of AGI or close to it. Long-range context matters, short-range context matters, idioms, short-hand, speaker identity, etc... all matters.

Current LLMs do great at this, the older translation algorithms based on "mere" deep learning and/or fancy heuristics fail spectacularly in the most trivial scenarios, except when translating between closely related languages, such as most (but not all) European ones. Dutch to English: Great! Chinese to English: Unusable!

I've been testing modern LLMs on various translation tasks, and they're amazing at it.[1] I've never had any issues with hallucinations or whatever. If anything, I've seen LLMs outperform human translators in several common scenarios!

Don't assume humans don't make mistakes, or that "organic mistakes" are somehow superior or preferred.

[1] If you can't read both the source and destination language, you can gain some confidence by doing multiple runs with multiple frontier models and then having them cross-check each other. Similarly, you can round-trip from a language you do understand, or round-trip back to the source language and have an LLM (not necessarily the same one!) do the checking for you.


这样确实可以避免AI味儿,但是可能阅读起来会体验很差,我最早就是用machine translation翻译,很多部分变得非常wordy,同时令人费解。

What models are you using? I'm using whatever's built into Firefox 140.6.0esr (some Bergamot derivative, iirc), which gives me:

> This can avoid the taste of AI, but it may be very bad to read, I first used machine translation translation, many parts become very wordy, and at the same time puzzling.

Perfectly clear and comprehensible. It's not fluent English, there are comma splices everywhere, and it translated "machine translation翻译" as "machine translation translation", but I understand it – and I'm confident it's close to what you actually meant to say. I can spot-check with my Chinese-to-English dictionary, and it seems like a slightly-better-than-literal translation. My understanding of your comment:

> This can avoid the smell of AI, but it may be a struggle to read. I initially used a dedicated machine translation system, but many parts became verbose (/ very wordy) and incomprehensible.

Generative models don't solve the 令人费解 problem: they just paper over it. If a machine translation is incomprehensible, that means the model did not understand what you were saying. Generative models are still transformer models: they're not going to magically have greater powers of comprehension than the dedicated translation model does. But they are trained and fine-tuned to pretend that they know what they're talking about. Is it better for information to be conspicuously lost in translation, or silently lost in translation?

Please, be willing to write in your native language, with your own words, and then provide us with either the original text, or a faithful translation of those words. Do you really want future historians to have to figure out which parts of this you wrote yourself, and which parts were invented by the AI model? I suspect that is not the reason you wrote this.


Per another comment, the original text is available: https://zhuanlan.zhihu.com/p/22190111. My last paragraph is spurious.

And do note non-LLM translation works pretty well with that article.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: