Humans train to recognize speech on much smaller datasets. If a small human is awake 16 hrs/day, that amounts to maximum 5840 hrs/year or 58400 hrs per 10 years. Why do mathematical models use more data and produce lower quality results? Is it because they don't understand the meaning of words?
My personal explanation is how large ML models are stateless while the growing brain is a (very) stateful thing. Stateless imitation will never be able to fully replicate a stateful system.
We evolved into having these brains that learn in stages, when the early years are responsible for core functions (walking, running, talking, etc). There's a lot of input (all the senses) that feeds learning, while in later years we mostly just take for granted whatever we learnt as children.
Humans train to recognize speech on much smaller datasets. If a small human is awake 16 hrs/day, that amounts to maximum 5840 hrs/year or 58400 hrs per 10 years. Why do mathematical models use more data and produce lower quality results? Is it because they don't understand the meaning of words?