Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> trained on 12 million hours of speech

Humans train to recognize speech on much smaller datasets. If a small human is awake 16 hrs/day, that amounts to maximum 5840 hrs/year or 58400 hrs per 10 years. Why do mathematical models use more data and produce lower quality results? Is it because they don't understand the meaning of words?



>that amounts to maximum 5840 hrs/year or 58400 hrs per 10 years

But no human can understand 300+ languages, especially not at 10 years old.

Still a very approximate comparison, but if you multiply your upper bound by 300, that gives 17.5 million hours, so more than what was used to train.


My personal explanation is how large ML models are stateless while the growing brain is a (very) stateful thing. Stateless imitation will never be able to fully replicate a stateful system.

We evolved into having these brains that learn in stages, when the early years are responsible for core functions (walking, running, talking, etc). There's a lot of input (all the senses) that feeds learning, while in later years we mostly just take for granted whatever we learnt as children.


The dataset humans are using does not just include the audio stream but also visual context. E.g. people pointing at things while talking about them.


Blind people...


Fun/interesting question... do you think blind people gesture when speaking?


https://news.uchicago.edu/story/blind-adults-gestures-resemb...

It's not from a department I'm a fan of, but they do know how to catalog gestures.


…or deaf.


To be fair, humans have the benefit of millions (billions?) of years of evolution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: