Hacker News new | past | comments | ask | show | jobs | submit login
A request to a YouTube video downloads the title 14 times and displays it twice (apitman.com)
81 points by apitman on Sept 29, 2020 | hide | past | favorite | 58 comments



Let's actually look at where it's being used.

html metadata - once in a <title> tag, once <meta name=title>, another in a <meta property=og:title>, another in a <meta name="twitter:title">. Probably some duplication for compatibility and different platforms.

2 times in <link rel=alternate> tags, for alternate versions of the site.

Actual displayed title below the video.

Suggested playlist mix - twice, one is the html tag 'title' and the other is the tag content.

Once for a title over the top of the video if the player is embedded in an iframe.

Once in a minified blob of javascript.

Basically all of these are ok use cases for duplicating data in the HTML. It's not excessive at all and I would have actually expected much more.

I don't know why the existence of metadata is a shock to the author.


There seems to be some disbelief that it needs to be repeated so many times. I agree though, this seems totally within expectations and I am kind of surprised to see this on HN frontpage. It’d be more interesting if the post did the basic legwork of analyzing each usage, but it does not seem to...


My assumption upon seeing the title but before reading anything else was along the lines of, "request" means "browse to the video page" and "downloads" refers to separate ajax calls. Something like that would certainly qualify as interesting...


Exactly. I really thought the blog author was implying there 14 GET requests rather than the title just appearing as a meta data through the page.


Also, we need controls. How many times does the page title appear inside the page in other websites? Is 14 even unusual or is that average? The methodology here is so sloppy, and the result could have been much more interesting.


You can also watch the network then search for the title and nothing shows up (but the network tab does search response bodies, eg. search for 'gapless'), so it really isn't being duplicated more times than it needs to.


This is the reality of software engineering. Fixing this would probably be a waste of time. Compared to downloading the video content, it doesn't matter at all. And as the article says, repeated content probably compresses extremely well.

An inefficient framework that enables the thousands(?) of engineers at YouTube to reliably design frontends is well worth it.

On the other hand, maybe YouTube's code is a horrible duplicated mess and this is just the result of that. But still, what actual benefit other than aesthetics would you get?


Yeah, trying to fix this would be a clear example of "premature optimization".

Youtube isn't some low data use application, where undisplayed HTML is a significant source of wasted capacity. Even if it were, the right answer would probably be some layer between HTML generation and delivery that gets rid of cruft.

Software engineers are really expensive.


So far the discussion was about the title text being replicated inside the HTML, presumably coming down in a single request. But, I would also not be surprised if redundant service calls occurred to fetch the data; especially when different teams are involved in maintaining a page coupling their designs adds a lot of burden that is probably more costly than just living with a little redundancy.


and this is how you get a vestigial tail


Is that bad?


it is when it becomes the source of bugs. To keep the analogy rolling perhaps some had phantom pains in their vestigal tails until their brains adapted to the missing item?


it was non judgemental


In my experience the answer is "all of the above"

It's rather analogous to physical products that are shipped with the instructions in 8+ languages or Tesla selling cars with their batteries limited in software.


This response is almost certainly compressed at multiple levels. I'd guess the duplicate content adds almost nothing to the actual overhead.


In other words, DRY should be applied based on business need.

Edit: I have a bone to pick with overly obsessive adherents to acronymized maxims and I am acting out because I am scarred. Downvotes deserved.


DRY is about code you write, not the code artifacts you ship to the client.


Thank you, that is a really good distinction.


When DRY makes you commit to bad abstractions, it's a harmful practice.


Where I work people have been trained on not over applying DRY. Which o actually agree with but some people take that way too far and duplicate needlessly in favour of even the simplest abstraction.. a named function. I am scarred by that


Yes?


Some videos have multiple different titles, which I guess reduces duplication :) (not that I think duplicating a string a few times in html matters at all. meta tags exist for a reason)

For example, if you go to https://www.youtube.com/watch?v=PvzBWFGEz8M, you'll notice the title in the browser tab differs from the title below the video.

Relevant html snippets are:

    <title>トーキョーゲットー - Eve MV - YouTube</title>

    <meta name="twitter:title" content="Tokyo Ghetto - Eve MV">
The actual title shown below the video is the english one, not japanese, and I think is sourced from javascript data.

I find it fascinating that youtube translates the title in some places users see it, but not all places, and only on a small subset of videos. It's pretty weird.


> It's pretty weird.

That's a nice way to say it. I'd call it "absurdly confusing". I can see the use case for people wanting to discover interesting stuff in a language they don't speak, but it has led me to wonder (in excitement with quickly following disappointment) many times why someone I follow has apparently suddenly released a video in a language unusual for them.

Google knows the languages I speak. I have it set in the settings. Last time I checked I haven't found a way to get rid of this idiocy.


There definitely needs to be a way to turn off the localization.


> I'm guessing it compresses well.

In terms of text inside html, transferred compressed; yes, it compresses extremely well.

Reading the title I was guessing the title was coming through in 14 different eventual xhr requests, not more simply being displayed.

Could it not be in the actual HTML 14 times, sure, but you drastically hurt the document usability (by humans, machines, social networks, and more) at the cost of bytes... as you're about to serve hundreds of megabytes of video.


Since this errant observation somehow made it to the FP, and eventually got flagged, I have a few more thoughts:

* Sorry the HN title ended up more clickbatey than intended. It's not making 14 extra HTTP requests just for the video title. It originally started with "An HTTP request" but it was a few characters too long for HN, and I didn't spend much time rethinking it.

* I agree the extra text isn't a problem (like I said, that'll compress well). I'm more concerned about the underlying complexity it signals. There is more obvious evidence of this complexity (it makes 70 network requests when you load the page even if you pause the video immediately), this is just a novel one for me.

* I appreciate the copies which are intended to interoperate with other systems like Twitter and OGP.

* I actually appreciate the fact that a JSON blob of all the video metadata is embedded in the HTML. It'll make my scraping task much simpler.


I don't know what all the client is doing with the title, but passing it down as a plain text string 14 times can actually be more efficient than doing so once and having client-side JS update all the various tags/divs which need it.


There's a lot to complain about modern web bloat but this is, absolutely, not one of the things to complain about.


ya I was like, this was worth posting and upvoting here? cmon


Some tiny text string repetition is not a problem, or even a waste (compression).

What is a real tragedy is Browsers caching YT Video data, despite YT player never reusing it - rewind more than ~5 minutes and watch brand new network request with new unique url to fetch same stuff thats still sitting in browser ram and disk cache. If you watch a lot of YT you are just burning x-xx GB of you SSD write cycles every day for no particular reason. 1h of 4K can be as big as 6GB.


YouTube UI is stretching the limits of browser capabilities for almost no benefit whatsoever. YouTube functionality could be recreated at a fraction of the computational cost. Try resizing YouTube and a duck duck go image search, which one resizes easier? Which one uses more CPU?


Are you talking about the homepage or a video player?

If you mean the video player page then a much fairer comparison is vimeo I suppose, though it definitely does a bit less in terms of recommendations and ads.

Definitely faster though.


Invidious (https://invidious.site/watch?v=jzwMjOl8Iyo&dark_mode=true&au... does about as much in terms of recommendations/features, and much less in terms of ads and anti-features, I think it is a great comparison.


I think something similar is going on with Netflix. I noticed on the AppleTV app, when you click on "More Episodes" it usually loads in a couple of seconds. But if it's with a show that has dozens of seasons (I noticed it when my spouse clicked on "Grey's Anatomy" which has 16 seasons), it takes much longer to load the episode poster frame and description. I think it's loading all of them at once, or something, because it definitely seems to change based on the number of available episodes. You'd think they could load the ones that are visible first, then load the rest in the background.


Encapsulated state. In large enough projects, it's easier to throw a new query into a component than to pass the needed information down from an enclosing component. Now "suggested playlist" owns its own reference to video metadata, rather than receiving it from the page root. I don't like this pattern, and it's a downside of component-style development if developers aren't encouraged to lift state.

GraphQL/Apollo and others are supposed to fix this by ensuring that any queries for a given object ID are cached, so if a bunch of different components request overlapping attributes on an object, they are hitting cache.

Without automated caching like this, I typically try to pass all data from the nearest owning data source. For instance, if I was building a video player and needed an attribute on a video object, I would find the nearest parent component that owned the page's video and pass it down from there. But that's not necessarily the typical/preferred way for software projects where tons of people are working on the same page in tandem. Personally, I see a lot of web apps where different components are loading at different times. It doesn't make for a nice experience.

As for people saying "it's not that much data/duplication", sure, but if the requests aren't cached, it could be increasing latency. In the OP, they said all 14 references are from one request, but there still could be duplicated queries in the server logic that is hydrating the data; I don't know how YouTube handles this and didn't look at how the app works. This is something they've likely made opaque to developers with a data fetching cache between the request handlers and the backing data, but it's no guarantee.

If you don't have a caching layer on the client and/or the server, I say be particular about lifting state and passing it into components. It's fun when you can just throw a React component into a page and it works without any hook-up, but it's also easy to pass a few props into a component.


This is less than 1KB before gzip. There are better performance improvements on the page for sure.


Yea. If you're fetching the html for a video page, you're presumably going to stream a video, which dwarfs the cost of "14 times a short string of text".

The article title had me anticipate to read that 14 separate requests were made to download the title multiple times. That would be bad, but this isn't that.


Opening the video and letting it play till the end, I found the following in my browser's debugger:

- 63 XHR requests - 52500 Kbytes of content in total - 42020 Kbytes of content in video data - 1600 Kbytes in Javascript - 0.36 Kbytes of title (repeated, 14 times)

What does this tell me? That the title is 0,000686% of the total transferred content for this video.

I'd be more worried about the 1.6 megabytes of executable code (that's two thirds of the original DOOM) just to basically embed a video player and a list of comments.

Out of all performance issues you can complain about (enough with the useless polyfills to artificially slow down Firefox already, Youtube staff!), repeating the title is not the important part.


It makes a lot of sense. Gzip is ubiquitous and it is easier to throw compute at the problem than engineering time to ensure all the various little silo'd projects that are shoe-horned into the rendering of a single page will share the same memory for what happens to be a common string.


It's not even that, the author seems to be concerned about the fact that the title is being used as replicated metadata. There's no "compute" other than some additional HTML involved.

It would probably be more work and more brittle to have to refer to some authoritative "title" string somewhere in the HTML rather than keeping it close to where it's needed, because additional HTML attributes are cheaper than JS doing DOM lookups.


This all happens via a single web request. Most likely it’s gzipped meaning this amounts to 10s of extra bytes over the wire. Even at Google scale you’re not going to see much benefit from rewriting your front end to optimize bandwidth here.


Hey, that's not quite fair. For no obvious reason, when YouTube shows you a translated title (and neither reveals it has done this nor gives you a choice in the matter) it only does it for some, but not all, of the titles in the HTML.


a.k.a taking issue with a non-issue.


Sad that creators dont *demand from platforms like utube to have an RSS feed.


YouTube does have RSS feeds for channels which are properly referenced on desktop (not on mobile). They have the format `https://www.youtube.com/feeds/videos.xml?channel_id={}`. Playlists also have feeds but are not referenced on the playlist page. The format of those is `https://www.youtube.com/feeds/videos.xml?playlist_id={}`


Youtube actually does provide RSS feeds for any given channel.


You can download an OPML file of your subscriptions from Youtube. It's a bit buried, but it is useful.


look at the source youtubedown to see the sort of bullshit they pull to make downloading hard

https://www.jwz.org/hacks/youtubedown


Your link has a rather interesting redirect for people referred from HN...


[flagged]


Looks fixed now. Are you the owner of the site?


They are not. The site belongs to Jamie Zawinski, who is commonly referred to as jwz. Zawinski is a well-known programmer (known for free software contributions) and current owner of the DNA Lounge.


In that case I'm curious how the behavior changed. It was linking me to an image on internet archive, but now it seems to be pointing to the source mentioned by @moonbug.


The behavior is based on your 'Referer' header in the http request that gets the page.

You can inspect the network request your browser is making to see what it's setting the referrer to.

Why it uses certain values is a combination of your browser's code and the w3c standards, which you may look at if your browser's one of the reasonable open source ones.


I'm familiar with referer. I'm saying the behavior changed. The first time I clicked on the link, I was sent to a different location than where I get sent now. From the other comment it appears the site is set up to do that.


It linked me to the same image, until I visited the site once.

After visiting the site it no longer redirects to the image.


How much RAM does this use? I've noticed that if I go down the rabbit hole of discussions on YouTube, in the end it will halt my machine, at least on Chrome. Haven't tested in other browsers. It should be mentioned that I'm a multi-tab guy, though, so that certainly doesn't help the situation lol.


Possibly a full kilobyte.


Can't be. In long threads it tends to bog down the entire computer. IDK memory leak?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: