One thing i'd love to see added here is a blurb about the importance of the location of the video metadata in the file. Specifically that you need to have the metadata at the start of the file rather than the end of the file (which is the default) for low-latency playback on web.
Explained: Freshly recorded MPEG (and almost all other container types) typically saves the header at the end of the file. This is the logical place to store header information after recording as you just append to the end of the file you've just written. It's the default.
Unfortunately having the header at the end of the file is terrible for web playback. A user must download the entire video before playback starts. You absolutely need to re-encode the video with FAST-Start set.
The header location is the number one mistake that i've seen a lot of website and developers make. If you find your website videos have a spinner that's seconds long before the video playback starts check the encoding. Specifically check that you've set fast start.
I've seen companies who have a perfectly reasonable static site behind a CDN spend a fortune hosting their videos with a third party to fix the latency issues they were seeing. The expensive third party was ultimately fixing the issue because they re-encoded the videos with fast start set. The reality is their existing solution backed by a CDN would also have worked if they encoded the videos correctly.
This is what the MP4 format calls the "moov atom".
Writing metadata at the end, instead of periodically interleaving it inside the file, is not only useless for web playback, but also for any creation process (i.e., live recording) which intends to call itself robust. Imagine recording a 1-hour long video, only to have some unrelated issue abruptly stopping it all (e.g. the application crashes, or the video camera suddenly loses power), thus rendering an unplayable file because the damned metadata didn't get to be written at the end of the file...
(luckily there are post-process methods that in some cases are able to restore such broken files, but still, a priori the file will be unplayable)
> This is what the MP4 format calls the "moov atom".
One of the few surviving legacies of Apple technology created during the Jobs interregnum.
Like everything else not from NeXT, the modern Apple has killed QuickTime, but they can’t eliminate its concepts and wacky constants from the MPEG-4 standard.
This is the sort of thing you can't help but run into when looking up ffmpeg incantations over the course of building a solution.
Just look how easily you ran into it when you weren't even looking for it.
For such a complex tool, the internet (like StackOverflow answers) ends up becoming a recipe book of practical solutions that would be hard to derive even if you sat down with the documentation and read it front to back. Nothing wrong with embracing that.
> A user must download the entire video before playback starts.
That was not my experience. If the web server announces the Content-Length of the video file, most browsers will make a range request for the end of the file and then go on from there.
Latency is still a bit higher this way, but no way near downloading the entire video file.
Yep, you make a really good point, thanks for the feedback. I think this bit would probably make sense to live in the upcoming "processing" section.
If anyone's interested, this is called the moov atom, and when you use "web playback" presets in tools like Handbrake, that's typically what it does is move that header to the beginning of the file. The progressive playback issues for files not optimized has improved somewhat when the file's host supports range requests, since browsers have gotten smart enough to try and check the end of the file for that header via range requests.
If you're interested in more along the codec/container line, one of my colleagues gave a talk on the internals of MP4 containers at Demuxed a few years ago[0] and another gave a Streaming Media keynote on the history of codecs and containers, particularly as they relate to online video[1].
I really love all the work the Mux team is doing! They don't just throw APIs over the wall. They are putting in lots of effort to educate/empower developers. This is good stuff, and not just propietary knowledge to sell something.
Also check out the video-dev Slack[0] and demuxed. Pion WebRTC and WebRTC for the Curious was motivated by conversations I had with other developers in their Slack.
Thank you so much for the kind words! For what it's worth, we're mods of video-dev, but we don't see that as "our" Slack, it's run by a bunch of folks in the online video community. Demuxed is similar in that we're organizers, but we 100% avoid it being a company conference.
Also, WebRTC for the Curious was on HN a while back, but for those that didn't see it the first time around: https://webrtcforthecurious.com/
Oh, I thought it was about VHS video for some reason. I guess I'm old enough to rant that this should be named 'how video streaming works', not 'how video works' :/
Thank you to the Mux team for putting this together! I've been using Mux as the basis of milk.video's transcoding and video serving from the start, and it's been an absolute pleasure to work with.
This site is an incredible, concise, and comprehensive resource for people trying to better understand how video works.
I attended the demuxed conference this year, and was exposed for the first time into the nitty gritty of how video works behind the scenes.
Huge props for the content not being a sales pitch, and truly being educational and informative.
For those who handles video streaming, which software do you use in production to segmented the video?
I've already tinkered around with ffmpeg including throwing around any movflags I could find and it will never be able to stream it properly in chunks.
My current working solution is using MP4Box with this command
Perhaps someone here can answer a question I've had for a while: if you stream a video which is actually a static image (think a song on YouTube, the 'video' is just the album cover) is there any way to optimise that? Or must the server stream that same image constantly as though it was a regular video?
That's optimized based on the video codec itself where you keep frames with major changes but surrounding frames can be optimized by encoding where certain pixels move. That's essentially what the bitrate tells us, how much of the bits are new and how many are copied from a previous frame. This is an oversimplification, but with static images, you would need a low bitrate after the initial few frames because it doesn't change. And bitrate is a proxy to how much data is transmitted while streaming, it's bits per second.
So a more concise answer, the codecs used for video by YouTube have the optimization you're thinking about built in. It would send a "pixel didn't change from last frame" and not need to send all the color information for all the pixels.
YouTube does not really do variable frame rate, and it's messy for editing, but it's another optimization that is possible and could be useful for the type of video you're describing.
The article describes DASH which you would need to send the full frame initially every time segment but per segment the previously described concept still applies. I don't believe YouTube uses DASH for anything outside live streams.
I do not know the answer, but I would absolutely shocked if it was streamed as a regular video. To the best of my knowledge, there are many, many optimizations which go into video.
I mean, technically, it would be "streamed as a regular video", it's just that "regular video encoding" is pretty efficient at encoding a single unchanging image nowadays.
The previous answers were speculative, but I have tried to download a random audio/video that was uploaded through YouTube's media programme (directly by media companies), and the answer is that they do re-send the video frame every 5 seconds for both H.264 and VP9, and it is encoded both at a rate of 25 frames per second and the width and height are the same. Here are the file details for both formats:
H.264/AAC4:
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L3.2
Format settings : CABAC / 2 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 2 frames
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 3 min 29 s
Bit rate : 375 kb/s
Width : 1 080 pixels
Height : 1 080 pixels
Display aspect ratio : 1.000
Frame rate mode : Constant
Frame rate : 25.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.013
Stream size : 9.35 MiB (74%)
Writing library : x264 core 155 r2901 7d0ff22
Codec configuration box : avcC
Audio
ID : 2
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 3 min 29 s
Bit rate mode : Constant
Bit rate : 128 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 44.1 kHz
Frame rate : 43.066 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 3.19 MiB (25%)
Default : Yes
Alternate group : 1
VP9/Opus:
Video
ID : 1
Format : VP9
Codec ID : V_VP9
Duration : 3 min 29 s
Width : 1 080 pixels
Height : 1 080 pixels
Display aspect ratio : 1.000
Frame rate mode : Constant
Frame rate : 25.000 FPS
Language : English
Default : Yes
Forced : No
Color range : Limited
Audio
ID : 2
Format : Opus
Codec ID : A_OPUS
Duration : 3 min 29 s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Bit depth : 32 bits
Compression mode : Lossy
Language : English
Default : Yes
Forced : No
Explained: Freshly recorded MPEG (and almost all other container types) typically saves the header at the end of the file. This is the logical place to store header information after recording as you just append to the end of the file you've just written. It's the default.
Unfortunately having the header at the end of the file is terrible for web playback. A user must download the entire video before playback starts. You absolutely need to re-encode the video with FAST-Start set.
The header location is the number one mistake that i've seen a lot of website and developers make. If you find your website videos have a spinner that's seconds long before the video playback starts check the encoding. Specifically check that you've set fast start.
I've seen companies who have a perfectly reasonable static site behind a CDN spend a fortune hosting their videos with a third party to fix the latency issues they were seeing. The expensive third party was ultimately fixing the issue because they re-encoded the videos with fast start set. The reality is their existing solution backed by a CDN would also have worked if they encoded the videos correctly.