A quick question for anyone familiar with the architecture of these Transformer-... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		zora_goron on March 29, 2022 \| parent \| context \| favorite \| on: GPT-3 can run code A quick question for anyone familiar with the architecture of these Transformer-based models -- I've heard that one reason why they don't work well with numbers is how the inputs are tokenized (i.e. as "chunks" rather than individual words/numbers). Is there anything architecturally preventing an exception in this form of tokenizing in the data preprocessing step, and passing numbers into the model in the format of 1 digit == 1 token? It seems like such a change could possibly result in a better semantic "understanding" of digits by the model.

Veedrac on March 29, 2022 [–]

Nothing prevents it, no. Transformers are certainly capable of learning mathematical tasks; consider [1] as an example, which uses big but regular token lengths.

Alternatively you could just scale 'till the problem solves itself.

[1] https://arxiv.org/abs/2201.04600

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact