I think the distortion of the video-finish camera should be considered when making these estimates. There's quite a lot of it happening in the shot. Narrowing down the specific camera intrinsic parameters may be a challenge. The EXIF data of the original footage might be fruitful.
It might also be possible to undistort using some assumptions on colinear points in the video (i.e., the finish line and signs should have straight lines).
It might also be possible to undistort using some assumptions on colinear points in the video (i.e., the finish line and signs should have straight lines).