These are the steps for transformation that may answer your questions:
1. Cubemap (source) - 360° videos are often sourced from six different images stitched together.
2. Equirect (video) - the cubemap is transformed to "equirect" trivially with a fragment shader—which maps every output pixel to some input pixel.
3. Perspective (output) - Mapping equirect video to a perspective projection is also done with a fragment shader, just using a different transformation depending on the focal point of the user.
4. Pannini (alternative output) - You mentioned this projection, and it's an alternative to the "perspective" projection that allows a wider output FOV that minimizes distortion in the periphery.
The transformation from cubemap to equirectangular projection seems unnecessary. Why convert? There's support for cubemap textures on modern hardware, and having a seperate video stream per face would make culling them easy. It wouldn't be as fine-grained as Carmack's end result, but it would be simple and avoids resampling.
1. Cubemap (source) - 360° videos are often sourced from six different images stitched together.
2. Equirect (video) - the cubemap is transformed to "equirect" trivially with a fragment shader—which maps every output pixel to some input pixel.
3. Perspective (output) - Mapping equirect video to a perspective projection is also done with a fragment shader, just using a different transformation depending on the focal point of the user.
4. Pannini (alternative output) - You mentioned this projection, and it's an alternative to the "perspective" projection that allows a wider output FOV that minimizes distortion in the periphery.