I guess this goes to show how challenging it can be to implement transformer neu... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

luckyt on Aug 9, 2023 | parent | context | favorite | on: Llama from scratch, or how to implement a paper wi...

I guess this goes to show how challenging it can be to implement transformer neural networks correctly. There are so many ways in which you can make mistakes at various steps, and there is no surefire way of knowing, you'll just have a slightly worse performance than you would've gotten otherwise. And in many cases, if you make a change to the network, either intentionally or not, the network adapts to it and there are many examples of different variants of the architecture performing similarly once trained. (though, in these cases, one might ask if it really matters if you match the original or not?)

One method I've seen people do to identify these types of mistakes is by precisely matching model outputs with a reference implementation. HuggingFace does this with tiny-random models: these models have randomized weights, but the output is expected to match exactly, if not, then it's an indicator of a bug. But this approach only works for bugs that arise during inference, detecting issues in data processing, optimizers, or anything that only happens during training is more challenging.

danieldk on Aug 9, 2023 | [–]

And since there is Huggingface transformers, you can also test against that, which is what we do in Curated Transformers (transformers is only a test-time dependency).

visarga on Aug 9, 2023 | [–]

The model really wants to learn, but it would use any shortcut to do it.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact