Each frame has the greenscreen turned into a fully transparent pixel, then gets blitted onto a buffer containing all of the existing buffer's data. That means that for each frame, you're building on the data from every frame that came before. This is fine if you're playing from beginning to end, but when you seek, you're going to break that continuity.
Yes, you do need to process from beginning to end. If you look at the original video, you see that each frame builds on the one before it, which builds on the one before that. Remember, the buffer is never cleared -- it's just overwritten with a few new pixels (the dude moving) each room.