Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your field of vision is equivalent to something like 500 Megapixels. And assume it’s uncompressed because it’s not like your eyeballs are doing H.264.

Given vision and the other senses, I’d argue that your average toddler has probably trained on more sensory information than the largest LLMs ever built long before they learn to talk.



There's an adaptation in there somewhere, though. Humans have a 'field of view' that constrains input data, and on the data processing side we have a 'center of focus' that generally rests wherever the eye rests (there's an additional layer where people learn to 'search' their vision by moving their mental center of focus without moving the physical focus point of the eye.

Then there's the whole slew of processes that pick up two or three key points of data and then fill in the rest (EX the moonwalking bear experiment [0]).

I guess all I'm saying is that raw input isn't the only piece of the puzzle. Maybe it is at the start before a kiddo _knows_ how to focus and filter info?

[0] https://www.youtube.com/watch?v=xNSgmm9FX2s


Attention is all you need. :)


You're an LLM, Harry!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: