I don't even mention it if others don't turn on their video, but I have to admit I struggle much more to follow someone through audio alone. Ironically I listen to a lot of podcasts and audiobooks, but there I can just press the button to "jump back 30s", which I do on most information dense episodes.
I do wonder if we could adapt deepfake tech to improve compression. A keyframe plus tracking of facial features could provide a very low bandwidth simulacrum of video calls.
This is a plot device in Vernor Vinge's "A Fire Upon The Deep". Something seems off in communications from an apparently friendly spaceship. Use of cached facial models means that whoever/whatever controls the ship now is just puppeting the models of the crew over a deliberately low bandwidth link.
I do wonder if we could adapt deepfake tech to improve compression. A keyframe plus tracking of facial features could provide a very low bandwidth simulacrum of video calls.