This might not be possible if the "system software" doesn't cooperate, but it's possible to encode videos such that you can keep them "warm" without decoding all frames, for example in a IBBPBBPBBPBB structure where all B-frames are not referenced by any other frame (other arrangements are possible). Forcing this structure has a cost, but it's much smaller than having more I-frames. You can then alternate decoding 3 such streams (each one offset by 1 frame, including the I-frames - this is not a problem, is just means you'll not be ready to output anything for 2 frames after a seek) for the cost of 1. Switching to 60fps is then "instantaneous". Old iTunes used to code h.264 video like this (with a PBPBPB structure, so it could play at half-rate, which it did if the CPU couldn't keep up). Note unreferenced does not imply B-frame nor the other way around.
Another (admittedly crazy) idea, for a setup with a lower-res version and a higher-res overlay, is trying to store the difference only, affording a (significant?) bitrate reduction for the high res "patches". This is very tricky to do in practice, though (needs larger range or losing the lsb; the codecs aren't really designed for this). I don't think it has ever been done.
Another (admittedly crazy) idea, for a setup with a lower-res version and a higher-res overlay, is trying to store the difference only, affording a (significant?) bitrate reduction for the high res "patches". This is very tricky to do in practice, though (needs larger range or losing the lsb; the codecs aren't really designed for this). I don't think it has ever been done.