> Now I'm wondering if it was trained on (imperfectly) OCRed data too... Or perh... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

krisoft on Dec 3, 2023 | parent | context | favorite | on: GPT-4 Can Almost Perfectly Handle Unnatural Scramb...

> Now I'm wondering if it was trained on (imperfectly) OCRed data too...

Or perhaps they inserted typos automatically in the training set as data augmentation. Tactics like that is known to increase the roboustness of some models, so why not?

lrei on Dec 3, 2023 [–]

Yup totally plausible. Things like word (token) dropout and inserting random uniform noise into embeddings or just edit distance perturbations to the tokens are all well known but still Figure 1 looks extremely impressive.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact