A request to a YouTube video downloads the title 14 times and displays it twice

Scaless · on Sept 29, 2020

Let's actually look at where it's being used.

html metadata - once in a <title> tag, once <meta name=title>, another in a <meta property=og:title>, another in a <meta name="twitter:title">. Probably some duplication for compatibility and different platforms.

2 times in <link rel=alternate> tags, for alternate versions of the site.

Actual displayed title below the video.

Suggested playlist mix - twice, one is the html tag 'title' and the other is the tag content.

Once for a title over the top of the video if the player is embedded in an iframe.

Once in a minified blob of javascript.

Basically all of these are ok use cases for duplicating data in the HTML. It's not excessive at all and I would have actually expected much more.

I don't know why the existence of metadata is a shock to the author.

jchw · on Sept 29, 2020

There seems to be some disbelief that it needs to be repeated so many times. I agree though, this seems totally within expectations and I am kind of surprised to see this on HN frontpage. It’d be more interesting if the post did the basic legwork of analyzing each usage, but it does not seem to...

Izkata · on Sept 29, 2020

My assumption upon seeing the title but before reading anything else was along the lines of, "request" means "browse to the video page" and "downloads" refers to separate ajax calls. Something like that would certainly qualify as interesting...

martyvis · on Sept 29, 2020

Exactly. I really thought the blog author was implying there 14 GET requests rather than the title just appearing as a meta data through the page.

CydeWeys · on Sept 29, 2020

Also, we need controls. How many times does the page title appear inside the page in other websites? Is 14 even unusual or is that average? The methodology here is so sloppy, and the result could have been much more interesting.

judge2020 · on Sept 29, 2020

You can also watch the network then search for the title and nothing shows up (but the network tab does search response bodies, eg. search for 'gapless'), so it really isn't being duplicated more times than it needs to.

Jabbles · on Sept 29, 2020

This is the reality of software engineering. Fixing this would probably be a waste of time. Compared to downloading the video content, it doesn't matter at all. And as the article says, repeated content probably compresses extremely well.

An inefficient framework that enables the thousands(?) of engineers at YouTube to reliably design frontends is well worth it.

On the other hand, maybe YouTube's code is a horrible duplicated mess and this is just the result of that. But still, what actual benefit other than aesthetics would you get?

kosievdmerwe · on Sept 29, 2020

Yeah, trying to fix this would be a clear example of "premature optimization".

Youtube isn't some low data use application, where undisplayed HTML is a significant source of wasted capacity. Even if it were, the right answer would probably be some layer between HTML generation and delivery that gets rid of cruft.

Software engineers are really expensive.

foobarian · on Sept 29, 2020

So far the discussion was about the title text being replicated inside the HTML, presumably coming down in a single request. But, I would also not be surprised if redundant service calls occurred to fetch the data; especially when different teams are involved in maintaining a page coupling their designs adds a lot of burden that is probably more costly than just living with a little redundancy.

agumonkey · on Sept 29, 2020

and this is how you get a vestigial tail

cortesoft · on Sept 29, 2020

Is that bad?

maerF0x0 · on Sept 29, 2020

it is when it becomes the source of bugs. To keep the analogy rolling perhaps some had phantom pains in their vestigal tails until their brains adapted to the missing item?

agumonkey · on Sept 30, 2020

it was non judgemental

pythonaut_16 · on Sept 29, 2020

In my experience the answer is "all of the above"

It's rather analogous to physical products that are shipped with the instructions in 8+ languages or Tesla selling cars with their batteries limited in software.

oh_sigh · on Sept 29, 2020

This response is almost certainly compressed at multiple levels. I'd guess the duplicate content adds almost nothing to the actual overhead.

birdyrooster · on Sept 29, 2020

In other words, DRY should be applied based on business need.

Edit: I have a bone to pick with overly obsessive adherents to acronymized maxims and I am acting out because I am scarred. Downvotes deserved.

teej · on Sept 29, 2020

DRY is about code you write, not the code artifacts you ship to the client.

birdyrooster · on Sept 30, 2020

Thank you, that is a really good distinction.

ijustlovemath · on Sept 29, 2020

When DRY makes you commit to bad abstractions, it's a harmful practice.

darepublic · on Sept 30, 2020

Where I work people have been trained on not over applying DRY. Which o actually agree with but some people take that way too far and duplicate needlessly in favour of even the simplest abstraction.. a named function. I am scarred by that

cortesoft · on Sept 29, 2020

TheDong · on Sept 29, 2020

Some videos have multiple different titles, which I guess reduces duplication :) (not that I think duplicating a string a few times in html matters at all. meta tags exist for a reason)

For example, if you go to https://www.youtube.com/watch?v=PvzBWFGEz8M, you'll notice the title in the browser tab differs from the title below the video.

Relevant html snippets are:

    <title>トーキョーゲットー - Eve MV - YouTube</title>

    <meta name="twitter:title" content="Tokyo Ghetto - Eve MV">

The actual title shown below the video is the english one, not japanese, and I think is sourced from javascript data.

I find it fascinating that youtube translates the title in some places users see it, but not all places, and only on a small subset of videos. It's pretty weird.

solarkraft · on Sept 29, 2020

> It's pretty weird.

That's a nice way to say it. I'd call it "absurdly confusing". I can see the use case for people wanting to discover interesting stuff in a language they don't speak, but it has led me to wonder (in excitement with quickly following disappointment) many times why someone I follow has apparently suddenly released a video in a language unusual for them.

Google knows the languages I speak. I have it set in the settings. Last time I checked I haven't found a way to get rid of this idiocy.

half-kh-hacker · on Sept 29, 2020

There definitely needs to be a way to turn off the localization.

smileysteve · on Sept 29, 2020

> I'm guessing it compresses well.

In terms of text inside html, transferred compressed; yes, it compresses extremely well.

Reading the title I was guessing the title was coming through in 14 different eventual xhr requests, not more simply being displayed.

Could it not be in the actual HTML 14 times, sure, but you drastically hurt the document usability (by humans, machines, social networks, and more) at the cost of bytes... as you're about to serve hundreds of megabytes of video.

apitman · on Sept 29, 2020

Since this errant observation somehow made it to the FP, and eventually got flagged, I have a few more thoughts:

* Sorry the HN title ended up more clickbatey than intended. It's not making 14 extra HTTP requests just for the video title. It originally started with "An HTTP request" but it was a few characters too long for HN, and I didn't spend much time rethinking it.

* I agree the extra text isn't a problem (like I said, that'll compress well). I'm more concerned about the underlying complexity it signals. There is more obvious evidence of this complexity (it makes 70 network requests when you load the page even if you pause the video immediately), this is just a novel one for me.

* I appreciate the copies which are intended to interoperate with other systems like Twitter and OGP.

* I actually appreciate the fact that a JSON blob of all the video metadata is embedded in the HTML. It'll make my scraping task much simpler.

paxys · on Sept 29, 2020

I don't know what all the client is doing with the title, but passing it down as a plain text string 14 times can actually be more efficient than doing so once and having client-side JS update all the various tags/divs which need it.

altdatathrow · on Sept 29, 2020

There's a lot to complain about modern web bloat but this is, absolutely, not one of the things to complain about.

ChrisArchitect · on Sept 29, 2020

ya I was like, this was worth posting and upvoting here? cmon

rasz · on Sept 30, 2020

Some tiny text string repetition is not a problem, or even a waste (compression).

What is a real tragedy is Browsers caching YT Video data, despite YT player never reusing it - rewind more than ~5 minutes and watch brand new network request with new unique url to fetch same stuff thats still sitting in browser ram and disk cache. If you watch a lot of YT you are just burning x-xx GB of you SSD write cycles every day for no particular reason. 1h of 4K can be as big as 6GB.

jmercouris · on Sept 29, 2020

YouTube UI is stretching the limits of browser capabilities for almost no benefit whatsoever. YouTube functionality could be recreated at a fraction of the computational cost. Try resizing YouTube and a duck duck go image search, which one resizes easier? Which one uses more CPU?

dijit · on Sept 29, 2020

Are you talking about the homepage or a video player?

If you mean the video player page then a much fairer comparison is vimeo I suppose, though it definitely does a bit less in terms of recommendations and ads.

Definitely faster though.

boogies · on Sept 29, 2020

Invidious (https://invidious.site/watch?v=jzwMjOl8Iyo&dark_mode=true&au... does about as much in terms of recommendations/features, and much less in terms of ads and anti-features, I think it is a great comparison.

thewebcount · on Sept 29, 2020

I think something similar is going on with Netflix. I noticed on the AppleTV app, when you click on "More Episodes" it usually loads in a couple of seconds. But if it's with a show that has dozens of seasons (I noticed it when my spouse clicked on "Grey's Anatomy" which has 16 seasons), it takes much longer to load the episode poster frame and description. I think it's loading all of them at once, or something, because it definitely seems to change based on the number of available episodes. You'd think they could load the ones that are visible first, then load the rest in the background.

david-cako · on Sept 29, 2020

Encapsulated state. In large enough projects, it's easier to throw a new query into a component than to pass the needed information down from an enclosing component. Now "suggested playlist" owns its own reference to video metadata, rather than receiving it from the page root. I don't like this pattern, and it's a downside of component-style development if developers aren't encouraged to lift state.

GraphQL/Apollo and others are supposed to fix this by ensuring that any queries for a given object ID are cached, so if a bunch of different components request overlapping attributes on an object, they are hitting cache.

Without automated caching like this, I typically try to pass all data from the nearest owning data source. For instance, if I was building a video player and needed an attribute on a video object, I would find the nearest parent component that owned the page's video and pass it down from there. But that's not necessarily the typical/preferred way for software projects where tons of people are working on the same page in tandem. Personally, I see a lot of web apps where different components are loading at different times. It doesn't make for a nice experience.

As for people saying "it's not that much data/duplication", sure, but if the requests aren't cached, it could be increasing latency. In the OP, they said all 14 references are from one request, but there still could be duplicated queries in the server logic that is hydrating the data; I don't know how YouTube handles this and didn't look at how the app works. This is something they've likely made opaque to developers with a data fetching cache between the request handlers and the backing data, but it's no guarantee.

If you don't have a caching layer on the client and/or the server, I say be particular about lifting state and passing it into components. It's fun when you can just throw a React component into a page and it works without any hook-up, but it's also easy to pass a few props into a component.

stevenwliao · on Sept 29, 2020

This is less than 1KB before gzip. There are better performance improvements on the page for sure.

leppr · on Sept 29, 2020

Yea. If you're fetching the html for a video page, you're presumably going to stream a video, which dwarfs the cost of "14 times a short string of text".

The article title had me anticipate to read that 14 separate requests were made to download the title multiple times. That would be bad, but this isn't that.

jeroenhd · on Sept 29, 2020

Opening the video and letting it play till the end, I found the following in my browser's debugger:

- 63 XHR requests - 52500 Kbytes of content in total - 42020 Kbytes of content in video data - 1600 Kbytes in Javascript - 0.36 Kbytes of title (repeated, 14 times)

What does this tell me? That the title is 0,000686% of the total transferred content for this video.

I'd be more worried about the 1.6 megabytes of executable code (that's two thirds of the original DOOM) just to basically embed a video player and a list of comments.

Out of all performance issues you can complain about (enough with the useless polyfills to artificially slow down Firefox already, Youtube staff!), repeating the title is not the important part.

whalesalad · on Sept 29, 2020

It makes a lot of sense. Gzip is ubiquitous and it is easier to throw compute at the problem than engineering time to ensure all the various little silo'd projects that are shoe-horned into the rendering of a single page will share the same memory for what happens to be a common string.

bobthepanda · on Sept 29, 2020

It's not even that, the author seems to be concerned about the fact that the title is being used as replicated metadata. There's no "compute" other than some additional HTML involved.

It would probably be more work and more brittle to have to refer to some authoritative "title" string somewhere in the HTML rather than keeping it close to where it's needed, because additional HTML attributes are cheaper than JS doing DOM lookups.

CapriciousCptl · on Sept 29, 2020

This all happens via a single web request. Most likely it’s gzipped meaning this amounts to 10s of extra bytes over the wire. Even at Google scale you’re not going to see much benefit from rewriting your front end to optimize bandwidth here.

TazeTSchnitzel · on Sept 29, 2020

Hey, that's not quite fair. For no obvious reason, when YouTube shows you a translated title (and neither reveals it has done this nor gives you a choice in the matter) it only does it for some, but not all, of the titles in the HTML.

0x6A75616E · on Sept 29, 2020

a.k.a taking issue with a non-issue.

cblconfederate · on Sept 29, 2020

Sad that creators dont *demand from platforms like utube to have an RSS feed.

kevincox · on Sept 29, 2020

YouTube does have RSS feeds for channels which are properly referenced on desktop (not on mobile). They have the format `https://www.youtube.com/feeds/videos.xml?channel_id={}`. Playlists also have feeds but are not referenced on the playlist page. The format of those is `https://www.youtube.com/feeds/videos.xml?playlist_id={}`

stryan · on Sept 29, 2020

Youtube actually does provide RSS feeds for any given channel.

throwaway889900 · on Sept 29, 2020

You can download an OPML file of your subscriptions from Youtube. It's a bit buried, but it is useful.

moonbug · on Sept 29, 2020

look at the source youtubedown to see the sort of bullshit they pull to make downloading hard

https://www.jwz.org/hacks/youtubedown

apitman · on Sept 29, 2020

Your link has a rather interesting redirect for people referred from HN...

moonbug · on Sept 29, 2020

[flagged]

apitman · on Sept 29, 2020

Looks fixed now. Are you the owner of the site?

notadog · on Sept 29, 2020

They are not. The site belongs to Jamie Zawinski, who is commonly referred to as jwz. Zawinski is a well-known programmer (known for free software contributions) and current owner of the DNA Lounge.

apitman · on Sept 29, 2020

In that case I'm curious how the behavior changed. It was linking me to an image on internet archive, but now it seems to be pointing to the source mentioned by @moonbug.

TheDong · on Sept 29, 2020

The behavior is based on your 'Referer' header in the http request that gets the page.

You can inspect the network request your browser is making to see what it's setting the referrer to.

Why it uses certain values is a combination of your browser's code and the w3c standards, which you may look at if your browser's one of the reasonable open source ones.

apitman · on Sept 30, 2020

I'm familiar with referer. I'm saying the behavior changed. The first time I clicked on the link, I was sent to a different location than where I get sent now. From the other comment it appears the site is set up to do that.

Shared404 · on Sept 29, 2020

It linked me to the same image, until I visited the site once.

After visiting the site it no longer redirects to the image.

kebman · on Sept 29, 2020

How much RAM does this use? I've noticed that if I go down the rabbit hole of discussions on YouTube, in the end it will halt my machine, at least on Chrome. Haven't tested in other browsers. It should be mentioned that I'm a multi-tab guy, though, so that certainly doesn't help the situation lol.

tedunangst · on Sept 29, 2020

Possibly a full kilobyte.

kebman · on Sept 30, 2020

Can't be. In long threads it tends to bog down the entire computer. IDK memory leak?