Hacker News new | past | comments | ask | show | jobs | submit login

As always, take those t/s stats with a huge boulder of salt. The demo shows a question "solved" in < 500 tokens. Still amazing that it's possible, but you'll get nowhere near those speeds when dealing with real-world problems at real-world useful context lengths for "thinking" models (8-16k tokens). Even epyc's with lots of channels go down to 2-4 t/s after ~4096 context length.



I checked how it performs in long run (prediction) on 4 x Raspberry Pi 5:

* pos=0 => P 138 ms S 864 kB R 1191 kB Connect

* pos=2000 => P 215 ms S 864 kB R 1191 kB .

* pos=4000 => P 256 ms S 864 kB R 1191 kB manager

* pos=6000 => P 335 ms S 864 kB R 1191 kB the


Smaller robots tend to have smaller problems. Even little help from the model will make them a lot more capable than they are today.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: