Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> GPU workloads tend to have terrible cold-start performance by their nature

My Fly machine loads from turned off to first inference complete in about 35 seconds.

If it’s already running, it’s 15 seconds to complete. I think that’s pretty decent.



As the sibling comment points out, usually cold starts are optimized on the order of milliseconds, so 20 seconds is a while for a user to be sitting around with nothing streamed.

And with the premium for per-second GPUs hovering around 2x that for hourly/monthly rentals, it gets even harder for products with scale to justify.

You'd want to have a lot of time where you're scaled to 0, but that in turn maps to a lot of cold starts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: