Improving Performance with HTTP Streaming

acdha · on May 18, 2023

This made me feel nostalgic for the early web where just about everything worked this way, but the author largely missed why we stopped: error handling. When you start a stream, you can’t retroactively say “we just got a database error, so it should be HTTP 500 rather than the 200 I sent 50ms ago”. This makes it hard to make nicer error pages (most web users were familiar with the PHP “headers already sent” error message) and that was especially a problem for anything where it wasn’t entirely obvious that something had failed part way through — it wasn’t uncommon for people to think a page was only supposed to have cursory information because the interactive features were failing & HTML output had just halted when that happened. The worst case scenarios were something like a generated CSV response being truncated but the user not realizing that they were missing some data. Since it started successfully, you’d see a 200 in your logs and client side error handling wouldn’t trigger but your users would have a bad experience. Yes, you can set things up to log errors better but approximately 2% of the PHP world did that.

The main question I had was how that’s changed with HTTP/2 in terms of being able to send something like RST_STREAM to trigger client side error handlers on e.g. a fetch promise so the normal failure paths would be triggered.

The other reason things like this became moot for many sites was that fronting caches became pervasive. For things like product pages, the content would be served by an intermediary as quickly as the network could deliver it so the benefits mentioned about seeing things like script or style tags faster are less significant outside of dynamic or uncommon pages, which people tend to see only after they’ve cached core static assets. At that point, the big performance wins would come from shipping less code and making it more efficient but they’re using React so that ship has sailed.

blittle · on May 18, 2023

I like Remix's paradigm of waiting to stream until the primary content is rendered and available, and then only streaming secondary content. So for example, on an e-commerce product page, all the markup representing the product would not stream. Secondary content like related products would. This allows you to 500 error if the primary content fails rendering for one reason or another. And if the secondary content fails to render during streaming, you can easily hide the content or render a placeholder.

vbezhenar · on May 18, 2023

You can send trailer "headers" after body. So HTTP is fine. Browser support was not, but may be it's better today.

esprehn · on May 18, 2023

Trailers don't let you change the status code though.

acdha · on May 18, 2023

That’s really the big one for me: can my server easily do something to cause e.g. curl to fail with a non-zero status or a JavaScript fetch to hit the error path? Satisfying that would make this approach easier to use safely.

vbezhenar · on May 19, 2023

You can terminate connection before response body is fully served (assuming you have content-length or transfer-encoding chunked). curl will exit with error code in this case. Not sure about JS fetch. You can't carry any additional error codes or messages this way, though, just to signal that something's wrong. And you can't do anything if you already sent full response. And you would need access to sockets which might not be available in all HTTP libraries.

acdha · on May 19, 2023

> You can terminate connection before response body is fully served (assuming you have content-length or transfer-encoding chunked). curl will exit with error code in this case.

Those qualifiers were what I was thinking about - in most cases where this is relevant you don’t have the content length in advance, and transfer encoding chunked might not trigger this if the fault aligns with chunks or it’s being done by some middle layer.

vbezhenar · on May 20, 2023

You need to send zero-length chunk to indicate end of stream. If you omit that chunk and close the connection, curl will fail.

acdha · on May 20, 2023

Yes, but if you have middle layers doing that it might be done even in error states. I had that happen in multiple applications where it meant something like a big CSV file was successfully downloaded but incomplete (or, IIRC, had a Python traceback at the end). Having a simple, unambiguous way to force a failure reduces the odds of an error being easy to miss.

nicbou · on May 18, 2023

I don't think that we should get performance advice from the Airbnb team. It's a really slow website that has no business being slow.

SerCe · on May 18, 2023

It’s true that AirBnB isn’t the fastest website in the world. However, I don’t believe it’s fair or fruitful to criticise an article where an engineer spent time writing a blog and sharing interesting findings based on the merit of AirBnB’s current performance. If the advice is good, it doesn't matter where it came from.

xiphias2 · on May 18, 2023

I just went to check how fast is airbnb: the base site loads fast, then they are showing popups of their ,,new features’’ and cookie popups.

They should get rid of the popups before starting other optimizations.

dgb23 · on May 18, 2023

Part of my job is web performance “optimization”[0]:

Cookie popups and third party JS are two of the biggest culprits for slow sites next to large images/videos and general bloat. Some of these tracking scripts and cookie popups require you to put them into the head. It’s terrible.

I always suggest my clients to simply remove them. Users are retained when the site loads and navigates fast. Spying on them doesn’t help…

Very large sites actually get leverage out of some analysis and tracking, however they should be doing that inhouse.

For everyone else it’s simply not worth it except that is your business model (not our clients).

[0] it’s called optimization, but it’s mostly just measuring, cleaning up, moving to standard practices and deleting alot of code. The times I get to optimize queries and algorithms are fun, impactful but rare. Most sites just suck because they are bloated and unpolished.

ronyfadel · on May 18, 2023

But then the marketing team is out of a job

CSSer · on May 18, 2023

Have you ever looked into Partytown? Nowadays you can have your cake and eat it too.

andy_ppp · on May 18, 2023

I still don’t think the cookie pop ups are mandatory, just put a toggle for them in the footer and you’ve complied in a perfectly reasonable way IMO.

rakoo · on May 18, 2023

No, the user must be aware of it and decide.

The real alternative is to not track at all. It's compliant, easy to implement, performant (the topic of the post) and does the right thing

littlecranky67 · on May 18, 2023

or use progressive escalation, like I.e. Apples AppStore guidelines. Get the consent banner when the user performs an action that requires storing the data (i.e in a cookie)

andy_ppp · on May 18, 2023

That is your opinion, I think until login you should avoid setting cookies anyway and ask as part of that journey.

rakoo · on May 18, 2023

Login cookies are not considerea tracking cookies, you don't need to ask for user consent for that

flagrant_taco · on May 18, 2023

I believe the argument here by the GP is that tracking cookies shouldn't be used at all until the user logs in.

No idea what that would do to any business value Airbnb gets from tracking, but the idea is that the lost analytics data is outweighed by the benefit of removing new user friction.

Zanfa · on May 18, 2023

It's probably only compliant if it's off by default, but I doubt anybody will ever toggle it on anyway.

nicbou · on May 18, 2023

It's infuriatingly slow when looking for a place, and there are countless complaints about it.

Perhaps the advice is good, but it would be like taking budgeting advice from a guy who's always borrowing money from you.

lucasfcosta · on May 18, 2023

It's a great write-up indeed. It's just unfortunate that anything that requires people to do more than "switch a toggle" on Nginx configs will probably not see huge adoption without significant advocacy effort.

The truth is that most people do not care (and they're probably right, as they're not heavily dependant on performance).

For the past year or so I've started thinking most OSS maintainers should just pick bolder defaults so we can push the web forward more easily.

dspillett · on May 18, 2023

> The truth is that most people do not care (and they're probably right, as they're not heavily dependant on performance).

This. For all simple cases it doesn't matter. For many not-so-simple cases the buffering is an advantage – it might impose an imperceptible delay on receiving the first data back from the initial request but make that request overall more efficient without significantly affecting anything else. This is why it has become the default in most cases. Only if your page/app is fairly complex (by necessity or by bad design) do you even need to think about buffering being a problem.

> so we can push the web forward

Not buffering at all is a step backwards in many instances. By all means make this the default in your setup instructions (or docker images etc) for your app/site, but it shouldn't be the default for, say, nginx packages in distros that affect a great many other projects.

What this is doing is selective buffering anyway: the buffer is still there, but it is being explicitly flushed at key points. Sometimes this is circumvented by parts further down the chain which you can't control in your code, hence they change the buffering behaviour in nginx. This could be even further beyond your control, perhaps imposed by some proxy that your users sit behind, so keep in mind that this trick does not universally work.

One trick I've used for internal utilities (that is a bad idea for more public systems) is to throw out a string of difficult-to-compress content in a comment before flushing. This breaks through small buffers elsewhere in the chain without needing to reconfigure the relevant parts, trading a bit of bandwidth for an apparent latency gain. It is useful for utility scripts that return a lot of data that is then presented in a fancy JS table or graph, send the initial page layout with a holding area saying “loading…”, flush, then send the data. This saves the extra request to get the data while letting the initial UI render while you wait.

vsareto · on May 18, 2023

>It's a great write-up indeed. It's just unfortunate that anything that requires people to do more than "switch a toggle" on Nginx configs will probably not see huge adoption without significant advocacy effort.

My guess is debugging this would be more complicated than other approaches. It's very nice they got around slow backend queries, but I wonder if this same level of effort could be applied to fix that problem too.

byroot · on May 18, 2023

I wonder why they didn't consider 103 Early Hints instead, as it solves the same problem in a much more elegant way, and doesn't require to re-architecture your application.

LinusU · on May 18, 2023

103 Early Hints is "experimental technology" (according to MDN) and isn't supported by Safari and Firefox. In Chrome it's only supported when using HTTP/2 or later (Firefox Nightly support it for HTTP/1).

ref: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/103

dmw_ng · on May 18, 2023

Early hints solves a different problem, there you have a list of asset URLs you know your rendered page will need. This is about producing pieces of rendered page as computation completes on the backend.

Implementing this is a pain in the ass, any new intermediary (haproxy, nginx, ...) introduces a potential new source of buffering or IO loop that attempts to helpfully batch up your writes. But when the technique is implemented, it is legitimately an amazing way to reduce latency at the client.

I use this technique in an app where an opening chunk is used to ensure the browser has begun fetching/executing the app JS bundle before a list response is rendered. After the list is rendered (incrementally as each chunk becomes available) it is again used to execute a slow summary query without having to break out a separate request. All that fits in a single HTTP response body, no additional latency is introduced by making separate API requests for the list or for the counts. Because of the initial chunk causing the JS to execute, it is also possible for the app to display certain dialogs (depending on query string parameters) that are immediately ready for user input even before the subsequent responses have finished rendering or downloading. In the case of this specific app, the technique means some dialogs are usable 300ms earlier than otherwise

Another issue not touched on in the post is how all this newly unbuffered incrementally-arriving data interacts with visual rendering on the client, there is potential for a lot of flicker and burned CPU continually re-rendering UI elements.

It sounds like Airbnb are disabling compression to make their implementation work. So long as CPU isn't a problem, you can also implement compression in the backend, and flush the compressor each time some useful unit of work is produced (some task completes, or some amount of data e.g. aligned to the size of an Ethernet MTU is available for writing). Assuming all the stars align, this should produce compressed output in a form the browser can act on every time it manages to read any amount of data from the backend.

All this effort seems like nonsense until finding you have a perfectly functional app on a crappy airport wifi network where it was otherwise impossible to even get Google to load

byroot · on May 18, 2023

> Early hints solves a different problem

I think it saves the same main problem which is to trigger assets download early.

However you are right streaming the response has a few extra advantages.

I still think getting most of the results with a tiny fraction of the effort is a better solution, but your mileage may vary.

dmw_ng · on May 18, 2023

What if the assets are a list of URLs computed incrementally? In my app's case, the specific JS bundle variant to load is itself a result of a database query. It's easy to see many scenarios where the majority of assets (e.g. say photos) aren't known prior to building the response body

byroot · on May 18, 2023

But you can emit as many 103 as you want. So you can start by sending the statically known assets, and then emit an extra 103 each time you are at the stage where the other assets are known.

vrnvu · on May 18, 2023

Simple solutions don't get software engineers promoted for their architecture skills, and catchy blog posts can't be written.

memco · on May 18, 2023

Can you share more? How are they implemented and how do they help?

byroot · on May 18, 2023

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/103

As they mention in the article, the main goal of streaming your response is to deliver the <head> tag to the browser sooner, so that it can start downloading assets sooner.

However it has many downsides like explained in the article (can't change response code or headers once they are sent etc), which requires many hacks.

103 early hints allow to asynchronously send "provisional" headers, including `Link rel=preload` headers, which the browser can use to start downloading assets, but without restricting in any way the final response rendering.

That's basically the same goal than HTTP2 Server Push (which was deprecated), but much simpler to implement (works with HTTP/1), and allow the browser not to download resources it already have in cache.

rajeevk · on May 18, 2023

I worked on a similar thing but on CDN edge. This was without changing the app and logic in the CDN used to cache the header from past responses and used to send the header immediately without waiting for response from the origin server. Once received the response form the origin server, we used to match with the header that was already sent. If there was a mismatch, then we used to see if we could fix the mismatch by adding a script tag in the beginning of the body otherwise we used to reload the front end.

I think, nowadays the majority of web apps react apps. The html generation logic is at the front end and the front end only does rest API calls. So this kind of optimization is not very useful.

We had this patent https://patents.google.com/patent/US20150012614A1

bgirard · on May 18, 2023

> So this kind of optimization is not very useful.

It's absolutely useful. If you wait for the HTML page to load, to fetch the script, to start the react app, to have the react app make the rest/graphql calls your perf will be awful.

You want to fetch and stream in the response to your query part of your document response. Me and my coworkers gave a talk on this and it's used on facebook.com: https://www.youtube.com/watch?v=WxPtYJRjLL0

In some situations you even want to stream the response within your query response within your document.

edwinvdgraaf · on May 18, 2023

I enjoyed reading this write-up. Used this technique in the past and have seen similar improvements as they mentioned in this article. [1] This was pre the FCP, LCP, CLS metrics so it considers the onload timings. Would be interesting to know on which percentiles this increase is seen and if this is measured with real user metrics (RUM) or captured by a synthetic tool such as lighthouse or web page test.

[1]: https://techlab.bol.com/en/blog/our-ride-to-peak-season-fron...

bgirard · on May 18, 2023

Here's an old but related post for what Facebook does: https://engineering.fb.com/2010/06/04/web/bigpipe-pipelining...

It's evolved since the full migrate to a react app but the high level concept hasn't changed.

ulrischa · on May 18, 2023

I did something similar with an ajax call to a php script using output buffer. What is important to use the ajax state 3. Here is a description (sorry in German): https://www.ulrischa.de/quasi-realtime-anwendung-mit-jquery-...

dgb23 · on May 18, 2023

PHP streams by default. It’s not entirely clear to me why you buffered the output and flushed it manually (in every iteration?) from the article.

motoboi · on May 18, 2023

With that, I think we took one more step towards the PHPfication of JavaScript, because this is BigPipe all over again.

kristjank · on May 18, 2023

I use a >10 years old X3470 for my college PC, which works surprisingly well for most computing-intensive tasks like compiling, virtualization, and running services in the background. It is however lagging behind horribly in web browsing, because of its poor single thread performance.

When I see posts like these, I become even more afraid of what the web is becoming. I don't want to have any website fetching itself piecemeal, freezing my browser window for ~5 seconds while the browser logic scrambles to reconstitute content from a million pieces.

acdha · on May 18, 2023

Don’t worry about this: this is how most sites on the web worked 20 years ago because PHP, classic ASP, ColdFusion, etc. didn’t buffer.

The reason people moved away from it is error handling: it was basically a cliche that you’d see an error message halfway through a page which had started out fine because there’s no way to go back and retract the HTTP 200 & start of the page which you had just sent.

The negatives you mentioned are already happening but that’s due to the widespread JavaScript culture of trying to put as much in the client side code as possible while not measuring performance except by having developers ask whether it seems fast enough on their M2 MacBook Pros with fiber connections.

dspillett · on May 18, 2023

> because PHP, classic ASP, ColdFusion, etc. didn’t buffer.

ASP (and I assume the others, my memory of them is more hazy) could buffer but it wasn't the default. You could control when the buffer was flushed if needed, to push initial content while something larger was being produced, heading towards the same compromise documented here (selective buffering) but from the other direction.

acdha · on May 18, 2023

PHP could, too, but it required an extension which wasn’t enabled by default and had some caveats about memory use and performance way back then. I know this because I used it to implement gzip compression in PHP 3 (or maybe really early 4?) with a hook to compress each chunk before sending it, which really helped our customer’s product pages & reports with tons of repetitive HTML.

dspillett · on May 18, 2023

This particular method is unlikely to tax your elderly CPU any extra. It is just allowing other supporting data (CSS and script files) to be downloaded while the server churns creating the rest of the content (perhaps waiting for slow DB replies).

I expect AirBnB is already unpleasant on such a machine, this won't make it any worse!

dgb23 · on May 18, 2023

Streaming HTML documents is basically the default for the majority of sites, because most sites run on PHP.

bobske3 · on May 18, 2023

what framework is used here? I don’t see any implementation or what?

Blazor has this inbuilt now in .NET 8 preview 4: https://devblogs.microsoft.com/dotnet/asp-net-core-updates-i...