Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A quick question for anyone familiar with the architecture of these Transformer-based models -- I've heard that one reason why they don't work well with numbers is how the inputs are tokenized (i.e. as "chunks" rather than individual words/numbers). Is there anything architecturally preventing an exception in this form of tokenizing in the data preprocessing step, and passing numbers into the model in the format of 1 digit == 1 token? It seems like such a change could possibly result in a better semantic "understanding" of digits by the model.


Nothing prevents it, no. Transformers are certainly capable of learning mathematical tasks; consider [1] as an example, which uses big but regular token lengths.

Alternatively you could just scale 'till the problem solves itself.

[1] https://arxiv.org/abs/2201.04600




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: