Hacker News new | past | comments | ask | show | jobs | submit login
How Video Works (howvideo.works)
241 points by circuit_ on Dec 14, 2020 | hide | past | favorite | 25 comments



One thing i'd love to see added here is a blurb about the importance of the location of the video metadata in the file. Specifically that you need to have the metadata at the start of the file rather than the end of the file (which is the default) for low-latency playback on web.

Explained: Freshly recorded MPEG (and almost all other container types) typically saves the header at the end of the file. This is the logical place to store header information after recording as you just append to the end of the file you've just written. It's the default.

Unfortunately having the header at the end of the file is terrible for web playback. A user must download the entire video before playback starts. You absolutely need to re-encode the video with FAST-Start set.

The header location is the number one mistake that i've seen a lot of website and developers make. If you find your website videos have a spinner that's seconds long before the video playback starts check the encoding. Specifically check that you've set fast start.

I've seen companies who have a perfectly reasonable static site behind a CDN spend a fortune hosting their videos with a third party to fix the latency issues they were seeing. The expensive third party was ultimately fixing the issue because they re-encoded the videos with fast start set. The reality is their existing solution backed by a CDN would also have worked if they encoded the videos correctly.


This is what the MP4 format calls the "moov atom".

Writing metadata at the end, instead of periodically interleaving it inside the file, is not only useless for web playback, but also for any creation process (i.e., live recording) which intends to call itself robust. Imagine recording a 1-hour long video, only to have some unrelated issue abruptly stopping it all (e.g. the application crashes, or the video camera suddenly loses power), thus rendering an unplayable file because the damned metadata didn't get to be written at the end of the file...

(luckily there are post-process methods that in some cases are able to restore such broken files, but still, a priori the file will be unplayable)


> This is what the MP4 format calls the "moov atom".

One of the few surviving legacies of Apple technology created during the Jobs interregnum.

Like everything else not from NeXT, the modern Apple has killed QuickTime, but they can’t eliminate its concepts and wacky constants from the MPEG-4 standard.


And that's the reason fragmented mp4 were invented.


> The reality is their existing solution backed by a CDN would also have worked if they encoded the videos correctly.

For folks who might not realize you can fix this post-encode, you can do this:

  ffmpeg -i in.mp4 -c copy -map 0 -movflags +faststart out.mp4


Aside from source code and hundred page technical specs, is there any good source for this type of information?


This is the sort of thing you can't help but run into when looking up ffmpeg incantations over the course of building a solution.

Just look how easily you ran into it when you weren't even looking for it.

For such a complex tool, the internet (like StackOverflow answers) ends up becoming a recipe book of practical solutions that would be hard to derive even if you sat down with the documentation and read it front to back. Nothing wrong with embracing that.


ffmpeg-fu is pretty googleable actually


> A user must download the entire video before playback starts.

That was not my experience. If the web server announces the Content-Length of the video file, most browsers will make a range request for the end of the file and then go on from there.

Latency is still a bit higher this way, but no way near downloading the entire video file.


Yep, you make a really good point, thanks for the feedback. I think this bit would probably make sense to live in the upcoming "processing" section.

If anyone's interested, this is called the moov atom, and when you use "web playback" presets in tools like Handbrake, that's typically what it does is move that header to the beginning of the file. The progressive playback issues for files not optimized has improved somewhat when the file's host supports range requests, since browsers have gotten smart enough to try and check the end of the file for that header via range requests.

If you're interested in more along the codec/container line, one of my colleagues gave a talk on the internals of MP4 containers at Demuxed a few years ago[0] and another gave a Streaming Media keynote on the history of codecs and containers, particularly as they relate to online video[1].

[0] https://www.youtube.com/watch?v=iJAPTY3B7yE [1] https://www.youtube.com/watch?v=9Qo3WfsK4vc


Matroska is really great for streaming. It is a very featureful container, it can even be streamed over a lossy connection!

It is a shame that Chrome only allows it to contain vp8/opus. If it allowed all the codecs in mp4 I think it would see much wider adoption.


I really love all the work the Mux team is doing! They don't just throw APIs over the wall. They are putting in lots of effort to educate/empower developers. This is good stuff, and not just propietary knowledge to sell something.

Also check out the video-dev Slack[0] and demuxed. Pion WebRTC and WebRTC for the Curious was motivated by conversations I had with other developers in their Slack.

[0] https://video-dev.herokuapp.com

[1] https://demuxed.com


Thank you so much for the kind words! For what it's worth, we're mods of video-dev, but we don't see that as "our" Slack, it's run by a bunch of folks in the online video community. Demuxed is similar in that we're organizers, but we 100% avoid it being a company conference.

Also, WebRTC for the Curious was on HN a while back, but for those that didn't see it the first time around: https://webrtcforthecurious.com/


Oh, I thought it was about VHS video for some reason. I guess I'm old enough to rant that this should be named 'how video streaming works', not 'how video works' :/


Thank you to the Mux team for putting this together! I've been using Mux as the basis of milk.video's transcoding and video serving from the start, and it's been an absolute pleasure to work with.

This site is an incredible, concise, and comprehensive resource for people trying to better understand how video works.

I attended the demuxed conference this year, and was exposed for the first time into the nitty gritty of how video works behind the scenes.

Huge props for the content not being a sales pitch, and truly being educational and informative.


For those who handles video streaming, which software do you use in production to segmented the video?

I've already tinkered around with ffmpeg including throwing around any movflags I could find and it will never be able to stream it properly in chunks.

My current working solution is using MP4Box with this command

  MP4Box -dash 1000 -rap -frag-rap test.mp4


Perhaps someone here can answer a question I've had for a while: if you stream a video which is actually a static image (think a song on YouTube, the 'video' is just the album cover) is there any way to optimise that? Or must the server stream that same image constantly as though it was a regular video?


That's optimized based on the video codec itself where you keep frames with major changes but surrounding frames can be optimized by encoding where certain pixels move. That's essentially what the bitrate tells us, how much of the bits are new and how many are copied from a previous frame. This is an oversimplification, but with static images, you would need a low bitrate after the initial few frames because it doesn't change. And bitrate is a proxy to how much data is transmitted while streaming, it's bits per second.

So a more concise answer, the codecs used for video by YouTube have the optimization you're thinking about built in. It would send a "pixel didn't change from last frame" and not need to send all the color information for all the pixels.

YouTube does not really do variable frame rate, and it's messy for editing, but it's another optimization that is possible and could be useful for the type of video you're describing.

The article describes DASH which you would need to send the full frame initially every time segment but per segment the previously described concept still applies. I don't believe YouTube uses DASH for anything outside live streams.


I do not know the answer, but I would absolutely shocked if it was streamed as a regular video. To the best of my knowledge, there are many, many optimizations which go into video.

This is a pretty good high-level video I came across - https://www.youtube.com/watch?v=r6Rp-uo6HmI


I mean, technically, it would be "streamed as a regular video", it's just that "regular video encoding" is pretty efficient at encoding a single unchanging image nowadays.


The previous answers were speculative, but I have tried to download a random audio/video that was uploaded through YouTube's media programme (directly by media companies), and the answer is that they do re-send the video frame every 5 seconds for both H.264 and VP9, and it is encoded both at a rate of 25 frames per second and the width and height are the same. Here are the file details for both formats:

H.264/AAC4:

  Video
  ID                                       : 1
  Format                                   : AVC
  Format/Info                              : Advanced Video Codec
  Format profile                           : High@L3.2
  Format settings                          : CABAC / 2 Ref Frames
  Format settings, CABAC                   : Yes
  Format settings, Reference frames        : 2 frames
  Codec ID                                 : avc1
  Codec ID/Info                            : Advanced Video Coding
  Duration                                 : 3 min 29 s
  Bit rate                                 : 375 kb/s
  Width                                    : 1 080 pixels
  Height                                   : 1 080 pixels
  Display aspect ratio                     : 1.000
  Frame rate mode                          : Constant
  Frame rate                               : 25.000 FPS
  Color space                              : YUV
  Chroma subsampling                       : 4:2:0
  Bit depth                                : 8 bits
  Scan type                                : Progressive
  Bits/(Pixel*Frame)                       : 0.013
  Stream size                              : 9.35 MiB (74%)
  Writing library                          : x264 core 155 r2901 7d0ff22
  Codec configuration box                  : avcC

  Audio
  ID                                       : 2
  Format                                   : AAC LC
  Format/Info                              : Advanced Audio Codec Low Complexity
  Codec ID                                 : mp4a-40-2
  Duration                                 : 3 min 29 s
  Bit rate mode                            : Constant
  Bit rate                                 : 128 kb/s
  Channel(s)                               : 2 channels
  Channel layout                           : L R
  Sampling rate                            : 44.1 kHz
  Frame rate                               : 43.066 FPS (1024 SPF)
  Compression mode                         : Lossy
  Stream size                              : 3.19 MiB (25%)
  Default                                  : Yes
  Alternate group                          : 1

VP9/Opus:

  Video
  ID                                       : 1
  Format                                   : VP9
  Codec ID                                 : V_VP9
  Duration                                 : 3 min 29 s
  Width                                    : 1 080 pixels
  Height                                   : 1 080 pixels
  Display aspect ratio                     : 1.000
  Frame rate mode                          : Constant
  Frame rate                               : 25.000 FPS
  Language                                 : English
  Default                                  : Yes
  Forced                                   : No
  Color range                              : Limited

  Audio
  ID                                       : 2
  Format                                   : Opus
  Codec ID                                 : A_OPUS
  Duration                                 : 3 min 29 s
  Channel(s)                               : 2 channels
  Channel layout                           : L R
  Sampling rate                            : 48.0 kHz
  Bit depth                                : 32 bits
  Compression mode                         : Lossy
  Language                                 : English
  Default                                  : Yes
  Forced                                   : No


Thanks. I learned something. Maybe now I can fix our video playback on our website


I wish I knew all this earlier this year, so I'd be writting different thing as my thesis

Great explaination tho.


Why bring HLS (Apple only) instead of DASH as an example?


DASH is right there, underneath HLS. You didn't read far enough.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: