More

Asm2D · 2026-01-09T08:17:12 1767946632

How JIT kills compatibility if it's only enabled on x86 and aaarch64? You can compile Blend2D without it and it would just work.

So no, it doesn't kill any compatibility - it only shows a different approach.

BTW GPU-only renderers suck, and many renderers that have GPU and CPU engines suck when GPU is not available or have bugs. Strong CPU rendering performance is just necessary for any kind of library if you want true compatibility across various platforms.

I have seen many many times broken rendering on GPU without any ability to switch to CPU. And the biggest problem is that more exotic HW you run it on, less chance that somebody would be able to fix it (talking about GPUs).

Asm2D · 2026-01-07T21:35:50 1767821750

You need to rerun the benchmarks if you want fresh numbers. The post was written when Blend2D didn't have JIT for AArch64, which penalized it a bit. Also on X86_64 the numbers are really good for Blend2D, which beats Blaze in some tests. So it's not black&white.

And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.

coffeeaddict1 · 2026-01-07T22:08:46 1767823726

> And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.

That's such a shame. Thanks a lot for Blend2D! I wish companies were less greedy and would fund amazing projects like yours. Unfortunately, I do think that everyone is a bit obsessed with GPUs nowadays. For 2D rendering the CPU is great, especially if you want predictable results and avoid having to deal with the countless driver bugs that plague every GPU vendor.

miguel_martin · 2026-01-08T17:50:36 1767894636

That is fair - sorry for spreading mis-information! That's unfortunate to hear about Blend2D.

Asm2D · 2026-01-07T14:31:00 1767796260

You know nothing.

Skia is definitely not a good example at all. Skia started as a CPU renderer, and added GPU rendering later, which heavily relies on caching. Vello, for example, takes a completely different approach compared to Skia.

NV path rendering is a joke. nVidia though that ALL graphics would be rendered on GPU within 2 years after making the presentation, and it took 2 decades and 2D CPU renderers still shine.

nicoburns · 2026-01-07T14:54:22 1767797662

I believe Skia's new Graphite architecture is much more similar to Vello

badlibrarian · 2026-01-07T15:53:07 1767801187

Right. The question is does Skia grows its broad and useful toolkit with an eye toward further GPU optimization? Or does Vello (broadened and perhaps burdened by Rust and the shader-obsessive crowd) grow a broad and useful API?

There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution, but I'll leave those discussions to dark-gray Discord forums rendered by Skia in a browser.

coffeeaddict1 · 2026-01-07T16:35:17 1767803717

> There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution

IMO, one of biggest benefit of a high performance renderer would be power savings (very important for laptops and phones). If I can run the same work but use half the power, then by all means I'd be happy to deal with the complications that the GPU brings. AFAIK though, no one really cares about that and even efforts like Vello are just targeting fps gains, which do correlate with reduced power consumption but only indirectly.

Asm2D · 2026-01-07T21:48:05 1767822485

Adding a power draw into the mix is pretty interesting. Just because a GPU can render something 2x faster in a particular test doesn't mean you have consumed 50% less power, especially when we talk about dedicated GPUs that can have power draw in hundreds of watts.

Historically 2D rendering on CPU was pretty much single-threaded. Skia is single-threaded, Cairo too, Qt mostly (they offload gradient rendering to threads, but it's painfully slow for small gradients, worse than single-threaded), AGG is single-threaded, etc...

In the end only Blend2D, Blaze, and now Vello can use multiple threads on CPU, so finally CPU vs GPU comparisons can be made more fairy - and power draw is definitely a nice property of a benchmark. BTW Blend2D was probably the first library to offer multi-threaded rendering on CPU (just an option to pass to the rendering context, same API).

As far as I know - nobody did a good benchmarking between CPU and GPU 2D renderers - it's very hard to do completely unbiased comparison, and you would be surprised how good the CPU is in this mix. Modern CPU cores consume maybe few watts and you can render to a 4K framebuffer with that single CPU core. Put rendering text to the mix and the numbers would start to be very interesting. Also GPU memory allocation should be included, because rendering fonts on GPU means to pre-process them as well, etc...

2D is just very hard, on both CPU and GPU you would be solving a little bit different problems, but doing it right is insane amount of work, research, and experimentation.

nicoburns · 2026-01-07T22:19:10 1767824350

It's not a formal benchmark, but my Browser Engine / Webview (https://github.com/DioxusLabs/blitz/) has pluggable rendering backends (via https://github.com/DioxusLabs/anyrender) with Vello (GPU), Vello CPU, Skia (various backends incl. Vulkan, Metal, OpenGL, and CPU) currently implemented

On my Apple M1 Pro, the Vello CPU renderer is competitive with the GPU renderers on simple scenes, but falls behind on more complex ones. And especially seems to struggle with large raster images. This is also without a glyph cache (so re-rasterizing every glyph every time, although there is a hinting cache) which isn't implemented yet. This is dependent on multi-threading being enabled and can consume largish portions of all-core CPU while it runs. Skia raster (CPU) gets similarish numbers, which is quite impressive if that is single-threaded.

Asm2D · 2026-01-08T08:21:29 1767860489

I think Vello CPU would always struggle with raster images, because it does a bounds check for every pixel fetched from a source image. They have at least described this behavior somewhere in Vello PRs.

The obsession for memory safety just doesn't pay off in some cases - if you can batch 64 pixels at once with SIMD it just cannot be compared to a per-pixel processor that has a branch in a path.

badlibrarian · 2026-01-07T17:40:44 1767807644

It's an argument you can make in any performance effort. But I think the "let's save power using GPUs" ship sailed even before Microsoft started buying nuclear reactors to power them.

Asm2D · 2026-01-07T14:28:47 1767796127

Blend2D doesn't benchmark against GPU renderers - the benchmarking page compares CPU renderers. I have seen comparisons in the past, but it's pretty difficult to do a good CPU vs GPU benchmarking.

Asm2D · 2025-07-13T22:47:51 1752446871

I think AsmGrid has a great overview of X86 and AArch64 instructions:

  - https://asmjit.com/asmgrid/

mananaysiempre · 2025-07-14T00:23:51 1752452631

For x86 encodings, there’s also http://ref.x86asm.net/index.html and of course the venerable https://sandpile.org/.

rfl890 · 2025-07-13T23:45:41 1752450341

Felix Cloutier's page has always been my go-to

cylinder714 · 2025-07-14T15:17:45 1752506265

https://www.felixcloutier.com/x86/

__alexander · 2025-07-13T23:26:15 1752449175

I haven’t seen this site before. Thanks for sharing it.

electroglyph · 2025-07-13T23:12:15 1752448335

wow, that's great. thanks for sharing!

Asm2D · 2025-07-12T09:39:07 1752313147

Then paint to a regular buffer and do a memcpy to the framebuffer that has no cache at the end of each frame, possibly only copying a region/tiles you want to update.

All the libraries that exist are designed to work like this.

Asm2D · 2025-06-25T19:01:23 1750878083

It's not, SSA and an optimizing pipeline was never the goal of AsmJit actually. You emit your SIMD code as you want it an no optimizer or other transformations mess with it - that's the goal and it works great for use-cases that don't need an additional optimizing pipeline.

However, it can do the mentioned cross-compilation. AsmJit is not dependent on host architecture in any way - you can generate AArch64 on X86 and vice versa. It's of course more optimized for JIT so it offers many tools to help with creating your own lightweight JIT compilers and running the code you generate.

So, no, it's not an LLVM alternative, but it's also not a trivial assembling engine. It has a unique position as it optimizes for low-latency code generation, which LLVM doesn't.

Rochus · 2025-06-25T21:05:49 1750885549

Interesting, thanks. I've seen that there is a kind of IR; does it support the same IR code to run on different targets? Is SIMD part of this abstraction?

Asm2D · 2025-06-10T19:07:00 1749582420

Blend2D has C-API and no dependencies - it doesn't even need a C++ standard library - so generally it's not an issue to build it and use it anywhere.

There is a different problem though. While many people working on Vello are paid full time, Blend2D lacks funding and what you see today was developed independently. So, the development is super slow and that's the reason that Blend2D will most likely never have the features other libraries have.

Asm2D · 2025-03-12T16:03:04 1741795384

Introduction of a new high performance PNG decoder provided by Blend2D library, which challenges existing decoders written in C++ and other programming languages.

Asm2D · on Feb 20, 2024

Cairo is in a maintenance-only mode. Nobody develops this library anymore and it only has a maintainer or two. Since nobody really worked on Cairo in the past 15 years it's not optimized for modern hardware.

You can see some existing benchmarks here:

  - https://blend2d.com/performance.html

Both the benchmarking tool and Blend2D are open-source projects so anyone can verify the numbers presented are indeed correct, and anyone can review/improve the backend-specific code that is used by the benchmarking tool.

gigatexal · on Feb 21, 2024

That’s crazy. I once lurked in the IRC of the project. I knew the creator. He was a family friend. I was a silly teen kid toying with Linux he was a dev who worked at redhat and lived in the same town as me.

I wonder what he’s up to these days?

Update/edit:

Ahh he moved on to Ampere: https://www.linkedin.com/in/carl-worth

Also I was a bad fan: the library had a co-founder too I thought it was a bespoke creation of Carl’s own making.

I remember building a bunch of stuff from source back in the day and a lot of Linux applications had Cairo as a dependency.

hgs3 · on Feb 20, 2024

> Cairo is in a maintenance-only mode.

That's too bad. Is their a successor planned or is Skia the recommended alternative?

Asm2D · on Feb 20, 2024

I think that when it comes to 2D rendering libraries there is in general not too many options if you want to target CPU or both CPU+GPU. Targeting GPU-only is bad for users that run on a hardware where GPU doesn't perform well or is not available at all due to driver issues or just not present (like servers).

If you consider libraries that offer CPU rendering there are basically:

  - AGG (CPU only)

  - Blend2D (CPU only, GPU planned, but not now)

  - Cairo (CPU only)

  - Qt's QPainter (CPU only, GPU without anti-aliasing / deprecated)

  - Skia (CPU + GPU)

  - Tiny Skia (CPU only, not focused on performance)

  - GPU only libs (there is many in C++ and Rust)

Nobody develops AGG and Cairo anymore and Qt's QPainter hasn't really improved in the past decade (Qt Company's focus is QtQuick, which doesn't use QPainter, so they don't really care about improving the performance of QPainter). So, only 2 libraries from this list have active development - Blend2D and Skia.

As an author of Blend2D I hope that it will be a go-to replacement for both AGG and Cairo users. Architecturally, Blend2D should be fine after a 1.0 release as the plan is to offer a stable ABI with 1.0 - And since Blend2D only exports C-API it should be a great choice for users who want to use every cycle and who want their code to work instead of making changes every time the dependency is updated (hello Skia).

At the moment Blend2D focuses on AGG users though, because AGG is much more widespread in commercial applications due to its licensing model and extensibility. However, AGG is really slow especially when rendering to large images (like 4K) so switching from AGG to Blend2D can offer a great performance benefits while avoiding other architectural changes of the application itself.

BTW Blend2D is still under active development. It started as an experiment and historically it only offered great performance on X86 platforms, but that is changing with a new JIT backend, which provides both X86 and AArch64 support and is almost ready for merge. This is good news as it will enable great performance on Apple hardware and also other AArch64 devices, basically covering 99% of the market.

a_e_k · on Feb 20, 2024

I'm the author of another CPU-only 2D vector graphics library that might be of interest:

- Canvas Ity (https://github.com/a-e-k/canvas_ity)

It's a tiny single-header C++ library in the style of the STB libraries. My aim was to make it dirt simple to be able to drop into almost any project and get high-quality rendering while providing an API comfortable to those used to <canvas>.

I've been checking out Blend2D every now and then. It seems like a very nice option for the bigger, but faster and more fully-featured end of the spectrum.

(Though for what it's worth, while raw performance isn't my priority, my little library still can hit about 70fps rendering the Postscript Tiger to 733x757 res with a single thread on my 7950x. :-)

Asm2D · on Feb 21, 2024

Nice project, thanks for sharing!

BTW for comparison - Blend2D can render SVG tiger in 1.68ms on the same machine (I also have 7950X) so it can provide almost an order of magnitude better performance in this case, which is great I think. But I understand the purpose of your library, sometimes it's nice to have something small :)

c-smile · on Feb 20, 2024

If I am not mistaken, NanoVG actually can render as by CPU (need external path rasterizer) as by GPU (OpenGL and other options).

NanoVG provided Canvas.Context kind of API in plain C.

cztomsik · on Feb 21, 2024

There is also thorvg by Samsung (or the authors work in there and they are also using it, not 100% sure but it's production ready)

https://github.com/thorvg/thorvg#dependencies

AFAIK they have experimental GPU backend but I'm not sure how far they are with it.

rubymamis · on Feb 20, 2024

Qt Quick should support non-GPU rendering[1]. I don't know how good it is, tho.

[1] https://www.toradex.com/blog/running-qt-without-gpu

mfabbri77 · on Feb 21, 2024

Do not forget: https://www.amanithvg.com (I'm one of the authors, 20+ years of active development). Full OpenVG 1.1 API, CPU only, cross-platform and analytical coverage antialiasing (rendering quality) as main feature. The rasterizer is really fast. I swear ;) At Mazatech we are working to a new GPU backend just these days.

AmanithVG is the library on which our SVG renderer: https://www.amanithsvg.com is based. All closed source as now, but things may change in future.

I will do some benchmarks of the current (and next, when the new GPU backend will be ready) version of our libraries against other libraries. Do you know if there are any standard tests (besides the classic post script Tiger)? Maybe we can all agree on a common test set for all vector graphics libs bechmarks?

Asm2D · on Feb 21, 2024

That's right! I didn't consider closed source libraries when writing the list. There would be more options in that case like Direct2D and CoreGraphics. However, my opinion is that nobody should be using closed source libraries to render 2D graphics in 2024 :)

Regarding benchmarks - I think Tiger is not enough. Tiger is a great benchmark to exercise the rasterizer and stroker, but it doesn't provide enough metrics about anything else. Tt's very important how fast a 2D renderer renders small geometries, be it rectangles or paths. Because when you look at screen most stuff is actually small. That's the main reason why Blend2D benchmarking tool scales the size of geometries from 8x8 to 256x256 pixels to make sure small geometries are rendered fast and covered by benchmarks. When you explore the results you will notice how inefficient other libraries actually are when it comes to this.