html metadata - once in a <title> tag, once <meta name=title>, another in a <meta property=og:title>, another in a <meta name="twitter:title">. Probably some duplication for compatibility and different platforms.
2 times in <link rel=alternate> tags, for alternate versions of the site.
Actual displayed title below the video.
Suggested playlist mix - twice, one is the html tag 'title' and the other is the tag content.
Once for a title over the top of the video if the player is embedded in an iframe.
Once in a minified blob of javascript.
Basically all of these are ok use cases for duplicating data in the HTML. It's not excessive at all and I would have actually expected much more.
I don't know why the existence of metadata is a shock to the author.
There seems to be some disbelief that it needs to be repeated so many times. I agree though, this seems totally within expectations and I am kind of surprised to see this on HN frontpage. It’d be more interesting if the post did the basic legwork of analyzing each usage, but it does not seem to...
My assumption upon seeing the title but before reading anything else was along the lines of, "request" means "browse to the video page" and "downloads" refers to separate ajax calls. Something like that would certainly qualify as interesting...
Also, we need controls. How many times does the page title appear inside the page in other websites? Is 14 even unusual or is that average? The methodology here is so sloppy, and the result could have been much more interesting.
You can also watch the network then search for the title and nothing shows up (but the network tab does search response bodies, eg. search for 'gapless'), so it really isn't being duplicated more times than it needs to.
This is the reality of software engineering. Fixing this would probably be a waste of time. Compared to downloading the video content, it doesn't matter at all. And as the article says, repeated content probably compresses extremely well.
An inefficient framework that enables the thousands(?) of engineers at YouTube to reliably design frontends is well worth it.
On the other hand, maybe YouTube's code is a horrible duplicated mess and this is just the result of that. But still, what actual benefit other than aesthetics would you get?
Yeah, trying to fix this would be a clear example of "premature optimization".
Youtube isn't some low data use application, where undisplayed HTML is a significant source of wasted capacity. Even if it were, the right answer would probably be some layer between HTML generation and delivery that gets rid of cruft.
So far the discussion was about the title text being replicated inside the HTML, presumably coming down in a single request. But, I would also not be surprised if redundant service calls occurred to fetch the data; especially when different teams are involved in maintaining a page coupling their designs adds a lot of burden that is probably more costly than just living with a little redundancy.
it is when it becomes the source of bugs. To keep the analogy rolling perhaps some had phantom pains in their vestigal tails until their brains adapted to the missing item?
It's rather analogous to physical products that are shipped with the instructions in 8+ languages or Tesla selling cars with their batteries limited in software.
Where I work people have been trained on not over applying DRY. Which o actually agree with but some people take that way too far and duplicate needlessly in favour of even the simplest abstraction.. a named function. I am scarred by that
Some videos have multiple different titles, which I guess reduces duplication :) (not that I think duplicating a string a few times in html matters at all. meta tags exist for a reason)
<title>トーキョーゲットー - Eve MV - YouTube</title>
<meta name="twitter:title" content="Tokyo Ghetto - Eve MV">
The actual title shown below the video is the english one, not japanese, and I think is sourced from javascript data.
I find it fascinating that youtube translates the title in some places users see it, but not all places, and only on a small subset of videos. It's pretty weird.
That's a nice way to say it. I'd call it "absurdly confusing". I can see the use case for people wanting to discover interesting stuff in a language they don't speak, but it has led me to wonder (in excitement with quickly following disappointment) many times why someone I follow has apparently suddenly released a video in a language unusual for them.
Google knows the languages I speak. I have it set in the settings. Last time I checked I haven't found a way to get rid of this idiocy.
In terms of text inside html, transferred compressed; yes, it compresses extremely well.
Reading the title I was guessing the title was coming through in 14 different eventual xhr requests, not more simply being displayed.
Could it not be in the actual HTML 14 times, sure, but you drastically hurt the document usability (by humans, machines, social networks, and more) at the cost of bytes... as you're about to serve hundreds of megabytes of video.
Since this errant observation somehow made it to the FP, and eventually got flagged, I have a few more thoughts:
* Sorry the HN title ended up more clickbatey than intended. It's not making 14 extra HTTP requests just for the video title. It originally started with "An HTTP request" but it was a few characters too long for HN, and I didn't spend much time rethinking it.
* I agree the extra text isn't a problem (like I said, that'll compress well). I'm more concerned about the underlying complexity it signals. There is more obvious evidence of this complexity (it makes 70 network requests when you load the page even if you pause the video immediately), this is just a novel one for me.
* I appreciate the copies which are intended to interoperate with other systems like Twitter and OGP.
* I actually appreciate the fact that a JSON blob of all the video metadata is embedded in the HTML. It'll make my scraping task much simpler.
I don't know what all the client is doing with the title, but passing it down as a plain text string 14 times can actually be more efficient than doing so once and having client-side JS update all the various tags/divs which need it.
Some tiny text string repetition is not a problem, or even a waste (compression).
What is a real tragedy is Browsers caching YT Video data, despite YT player never reusing it - rewind more than ~5 minutes and watch brand new network request with new unique url to fetch same stuff thats still sitting in browser ram and disk cache. If you watch a lot of YT you are just burning x-xx GB of you SSD write cycles every day for no particular reason. 1h of 4K can be as big as 6GB.
YouTube UI is stretching the limits of browser capabilities for almost no benefit whatsoever. YouTube functionality could be recreated at a fraction of the computational cost. Try resizing YouTube and a duck duck go image search, which one resizes easier? Which one uses more CPU?
Are you talking about the homepage or a video player?
If you mean the video player page then a much fairer comparison is vimeo I suppose, though it definitely does a bit less in terms of recommendations and ads.
I think something similar is going on with Netflix. I noticed on the AppleTV app, when you click on "More Episodes" it usually loads in a couple of seconds. But if it's with a show that has dozens of seasons (I noticed it when my spouse clicked on "Grey's Anatomy" which has 16 seasons), it takes much longer to load the episode poster frame and description. I think it's loading all of them at once, or something, because it definitely seems to change based on the number of available episodes. You'd think they could load the ones that are visible first, then load the rest in the background.
Encapsulated state. In large enough projects, it's easier to throw a new query into a component than to pass the needed information down from an enclosing component. Now "suggested playlist" owns its own reference to video metadata, rather than receiving it from the page root. I don't like this pattern, and it's a downside of component-style development if developers aren't encouraged to lift state.
GraphQL/Apollo and others are supposed to fix this by ensuring that any queries for a given object ID are cached, so if a bunch of different components request overlapping attributes on an object, they are hitting cache.
Without automated caching like this, I typically try to pass all data from the nearest owning data source. For instance, if I was building a video player and needed an attribute on a video object, I would find the nearest parent component that owned the page's video and pass it down from there. But that's not necessarily the typical/preferred way for software projects where tons of people are working on the same page in tandem. Personally, I see a lot of web apps where different components are loading at different times. It doesn't make for a nice experience.
As for people saying "it's not that much data/duplication", sure, but if the requests aren't cached, it could be increasing latency. In the OP, they said all 14 references are from one request, but there still could be duplicated queries in the server logic that is hydrating the data; I don't know how YouTube handles this and didn't look at how the app works. This is something they've likely made opaque to developers with a data fetching cache between the request handlers and the backing data, but it's no guarantee.
If you don't have a caching layer on the client and/or the server, I say be particular about lifting state and passing it into components. It's fun when you can just throw a React component into a page and it works without any hook-up, but it's also easy to pass a few props into a component.
Yea. If you're fetching the html for a video page, you're presumably going to stream a video, which dwarfs the cost of "14 times a short string of text".
The article title had me anticipate to read that 14 separate requests were made to download the title multiple times. That would be bad, but this isn't that.
Opening the video and letting it play till the end, I found the following in my browser's debugger:
- 63 XHR requests
- 52500 Kbytes of content in total
- 42020 Kbytes of content in video data
- 1600 Kbytes in Javascript
- 0.36 Kbytes of title (repeated, 14 times)
What does this tell me? That the title is 0,000686% of the total transferred content for this video.
I'd be more worried about the 1.6 megabytes of executable code (that's two thirds of the original DOOM) just to basically embed a video player and a list of comments.
Out of all performance issues you can complain about (enough with the useless polyfills to artificially slow down Firefox already, Youtube staff!), repeating the title is not the important part.
It makes a lot of sense. Gzip is ubiquitous and it is easier to throw compute at the problem than engineering time to ensure all the various little silo'd projects that are shoe-horned into the rendering of a single page will share the same memory for what happens to be a common string.
It's not even that, the author seems to be concerned about the fact that the title is being used as replicated metadata. There's no "compute" other than some additional HTML involved.
It would probably be more work and more brittle to have to refer to some authoritative "title" string somewhere in the HTML rather than keeping it close to where it's needed, because additional HTML attributes are cheaper than JS doing DOM lookups.
This all happens via a single web request. Most likely it’s gzipped meaning this amounts to 10s of extra bytes over the wire. Even at Google scale you’re not going to see much benefit from rewriting your front end to optimize bandwidth here.
Hey, that's not quite fair. For no obvious reason, when YouTube shows you a translated title (and neither reveals it has done this nor gives you a choice in the matter) it only does it for some, but not all, of the titles in the HTML.
They are not. The site belongs to Jamie Zawinski, who is commonly referred to as jwz. Zawinski is a well-known programmer (known for free software contributions) and current owner of the DNA Lounge.
In that case I'm curious how the behavior changed. It was linking me to an image on internet archive, but now it seems to be pointing to the source mentioned by @moonbug.
The behavior is based on your 'Referer' header in the http request that gets the page.
You can inspect the network request your browser is making to see what it's setting the referrer to.
Why it uses certain values is a combination of your browser's code and the w3c standards, which you may look at if your browser's one of the reasonable open source ones.
I'm familiar with referer. I'm saying the behavior changed. The first time I clicked on the link, I was sent to a different location than where I get sent now. From the other comment it appears the site is set up to do that.
How much RAM does this use? I've noticed that if I go down the rabbit hole of discussions on YouTube, in the end it will halt my machine, at least on Chrome. Haven't tested in other browsers. It should be mentioned that I'm a multi-tab guy, though, so that certainly doesn't help the situation lol.
html metadata - once in a <title> tag, once <meta name=title>, another in a <meta property=og:title>, another in a <meta name="twitter:title">. Probably some duplication for compatibility and different platforms.
2 times in <link rel=alternate> tags, for alternate versions of the site.
Actual displayed title below the video.
Suggested playlist mix - twice, one is the html tag 'title' and the other is the tag content.
Once for a title over the top of the video if the player is embedded in an iframe.
Once in a minified blob of javascript.
Basically all of these are ok use cases for duplicating data in the HTML. It's not excessive at all and I would have actually expected much more.
I don't know why the existence of metadata is a shock to the author.