As the sibling comment points out, usually cold starts are optimized on the order of milliseconds, so 20 seconds is a while for a user to be sitting around with nothing streamed.
And with the premium for per-second GPUs hovering around 2x that for hourly/monthly rentals, it gets even harder for products with scale to justify.
You'd want to have a lot of time where you're scaled to 0, but that in turn maps to a lot of cold starts.
My Fly machine loads from turned off to first inference complete in about 35 seconds.
If it’s already running, it’s 15 seconds to complete. I think that’s pretty decent.