Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does caching make as much sense as a cost saving measure on Cerebras hardware as it does on mainstream GPU's? Caching should be preferred if SSD->VRAM is dramatically cheaper than recalculation. If Cerebras is optimized for massively parallel compute with fixed weights, and not a lot of memory bandwidth into or out of the big wafer, it might actually make sense to price per token without a caching discount. Could someone from the company (or otherwise familiar with it) comment on the tradeoff?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: