> This guide will be relevant until OS compositors fundamentally change from just dealing with bitmaps. Which is unlikely to happen anytime soon.
There is absolutely no reason why browsers have to use the native compositor for CSS. It's a bad fit, and browsers should stop doing it.
> And doing UI layout on background threads breaks the basic design of pretty much every UI framework, web or native, which are usually single-threaded.
That's why you have a render tree on a separate thread from the DOM. Look at how Servo does it (disclaimer: I work on Servo). We've proven that it works.
> And doing UI layout on background threads breaks the basic design of pretty much every UI framework, web or native, which are usually single-threaded.
By "single-threaded" I meant business logic and non-fixed layout being done on the same thread. "Rendering" (drawing to a bitmap or translating to the OS compositor's object model or direct OpenGL/DirectX/etc.) is done on a separate thread in most frameworks I'm aware of.
> There is absolutely no reason why browsers have to use the native compositor for CSS. It's a bad fit, and browsers should stop doing it.
Here are a few:
- On some platforms, animations on native layers are applied every frame, even if the application that owns the layer is busy. This means fewer opportunities to drop frames.
- Sometimes web rendering engines are embedded in apps. Your app may want to draw web content that filters some other content that's behind it. This content may not be available to the browser engine (it may be in a separate process, for example). A native layer can apply this filter in the compositor process, where the content is available.
- Using the native compositor makes it easier to embed components (like video) that are provided by the system as native layers.
> - On some platforms, animations on native layers are applied every frame, even if the application that owns the layer is busy. This means fewer opportunities to drop frames.
The browser can and should do the same with its CSS compositor. There's no reason why the CSS compositor should run on the main thread (and, in fact, no modern browser works this way).
> - Sometimes web rendering engines are embedded in apps. Your app may want to draw web content that filters some other content that's behind it. This content may not be available to the browser engine (it may be in a separate process, for example). A native layer can apply this filter in the compositor process, where the content is available.
For transparent regions, a browser can do this by simply exporting its entire composited tree as a single transparent layer, where it can be composited over other content. In the case of a single-layer page, this is what the browser is doing anyway.
If you're talking about CSS filters, there's no way that I know of in CSS to say "filter the stuff behind me" in the first place. You can only filter elements' contents.
> - Using the native compositor makes it easier to embed components (like video) that are provided by the system as native layers.
I grant that you have to use the native compositor to get accelerated video on some platforms. But that doesn't mean that a browser should do everything this way. In fact, no browser even tries to export all of CSS to the compositor: this is why you have the various "layerization" hacks which give rise to the sadness in this article. Reducing what needs to be layerized to just video would actually decrease complexity a lot over the status quo. (If you don't believe me, try to read FrameLayerBuilder.cpp in Gecko. It would be way simpler if video were the only thing that generated layers.)
> The browser can and should do the same with its CSS compositor. There's no reason why the CSS compositor should run on the main thread (and, in fact, no modern browser works this way).
You still have to swap buffers from your background thread and then composite the buffer instead of compositing the animating layer directly. It's a small advantage, but it is an advantage.
> If you're talking about CSS filters, there's no way that I know of in CSS to say "filter the stuff behind me" in the first place. You can only filter elements' contents.
> You still have to swap buffers from your background thread and then composite the buffer instead of compositing the animating layer directly. It's a small advantage, but it is an advantage.
By this I assume you mean that when you have two compositors, you have an extra blit. This is mostly true (though it's not necessarily true if the OS compositor is using the GPU's scanout compositing), but it's by no means worth the enormous downsides of current layerization hacks. Right now, when you as a Web developer fall off the narrow path of stuff that the OS compositor can do, your performance craters. The current status quo is not working: only about 50% of CSS animations in the wild are performed off the main thread.
There's another enormous downside to using the OS compositor: losing all Z-buffer optimizations. Right now, browsers usually throw away 2x or more of their painting performance painting pixels that are occluded. When using the OS compositor, the browser painting engine doesn't know which pixels are occluded, because that's something only the OS compositor knows, so it has to paint the contents of every buffer just in case. But with a smart CSS compositor, the browser can early Z-reject pixels covered up by other elements.
> yes, it's an experimental property, but it's the one I was thinking about
Ah, OK, I wasn't aware of that because it's only implemented in Safari right now. Well, using the OS compositor would make it easier to apply backdrop filters, as long as the OS compositor supports everything in the SVG filter spec (a big assumption—I suspect this is only the case on macOS and iOS!) But even with that, I think it results in less complexity to just use the OS compositor for this specific case and fall back on the browser compositor for most everything else, just as with video. CSS really does not map onto OS compositors very well.
There's a good reason to, in this case: that the OS compositor is not designed to render the full generality of CSS.
The status quo is not working: only about half of CSS animations in the wild run on the compositor. As browser vendors, we need to admit that the attempt to carve out a limited, fast subset of CSS has failed, and we should do the hard work to make all of CSS run fast.
That's not true at all. You don't have most of these problems when dealing with native apps, for example, and the reason is that browser rendering is just extremely slow. Thus the gap between the common path and the fast path is unusually massive on browsers.
Although browsers needing to be highly defensive doesn't help, either. If a native app renders slow you generally blame the app, whereas if a website is slow you blame the browser (regardless of who is actually at fault). This leads to browser being conservative and defensive in their graphics stacks so that scrolling can be smooth in the face of graphically intense rendering.
OS composition is unrelated entirely here, and really there's no need for that to change away from just dealing with pixmaps.
I'm not as familiar with native windows apps anymore but the difference typically is that while there is a faster path on native it's not as critical that it's taken. As in, the slow path is still generally fast enough.
That certainly used to be true on win32 (in that an animation of left/top would easily hit 60fps on a desktop computer), but maybe Microsoft's UI toolkits have regressed significantly? I rather doubt it, though, and suspect that it would still work just fine on a native windows app even though the web equivalent grinds to a halt.
Animating left/top on a fixed layout is indeed still fast. But modern app layouts using responsive design are still going to need to hit the UI thread to modify controls as the size changes.
The same issues also happen on Android, which is why Android has separate threads for layouting and interactions, rendering, and business logic in every app, and why all complicated rendering ends up on separate GPU layers.
> and why all complicated rendering ends up on separate GPU layers.
No, it doesn't. The app renders to a single surface in a single GPU render pass unless the app uses a SurfaceView, which is generally only for media uses (camera, video, games).
Multiple layers are only used when asked for explicitly (View.setLayerType) or when required for proper blending. They are generally avoided otherwise as it's generally slower to use multiple layers.
You can absolutely do the "bad" things in the linked article in a native Android app and still hit 60fps pretty trivially. The accelerated properties, like View.setTranslationX/Y, only bypass what's typically a small amount of work (and don't use a caching layer). It's an incremental improvement, not something absolutely required. Scrolling in a RecyclerView or ListView, for example, doesn't even do that. It just moves the left/top of every view and re-renders, and that's plenty fast to hit 60fps.
This used to be true, but since Android M and N, where a lot more animations were added, a lot of animation now happens on separate GPU layers (and is rendered, if necessary, by separate threads).
This was especially necessary due to many of the ripple animations being introduced.
I think your confusing the RenderThread with GPU layers. There's only 1 rendering thread per app and it handles all rendering work done by that app. It's really no different than pre-M rendering other than a chunk of what used to be on the UI thread is now on a different thread. The general flow is the same.
The new part is that some animations (basically just the Ripple Animation) can happen on their own on that thread, but it doesn't use a GPU layer for it nor a different OS composition layer.
> but it doesn't use a GPU layer for it nor a different OS composition layer.
Really? As so often, there was a lot of talk about doing that beforehand, and it wasn’t discussed at all later on, so I had assumed that this had been done. Interesting that this didn’t happen.
What’d be the reason for that? Animating objects on a static background seems a prime case for GPU layers. Or was it the issue with the framebuffer sizes being too huge again?
Think about what the static background actually is. It's probably either an image (which is already just a static GL texture, no need to cache your bitmap in another bitmap), or it's something like a round rect which can actually be rendered faster than sampling from a texture (since it's a simple quad + a simple pixel shader - no texture fetches slowing things down). In such a scenario a GPU layer just ends up making things slower and uses more RAM.
> This guide will be relevant until OS compositors fundamentally change from just dealing with bitmaps. Which is unlikely to happen anytime soon.
A browser is essentially a small OS with server-side rendering, where clients/web apps send HTML/CSS/JS for their GUI and the compositor/browser engine renders into bitmaps. We probably need something simpler than HTML/CSS/JS, though, if we want it to be reasonably easy to implement a fast rendering engine.
Hmm... Wouldn't it be possible to address the problem at another level? For instance, when it's discovered that layout properties are being animated, see if it would be possible to create an equivalent effect in the compositing stage, if so, convert the layout position changes to transforms?
I realize that would probably not be easy, but maybe still possible without major architectural changes?
> For instance, when it's discovered that layout properties are being animated, see if it would be possible to create an equivalent effect in the compositing stage, if so, convert the layout position changes to transforms?
Yes, and we as browser vendors should absolutely do that.
Though a better solution is to just make the compositor able to accelerate the full generality of CSS in the first place.
That's the kind of thing that would cause a rejection: elements are gong to be created, so this can't be converted to compositing stage operations instead.
And doing UI layout on background threads breaks the basic design of pretty much every UI framework, web or native, which are usually single-threaded.