While a GPU accelerated terminal emulator sounds great on paper, do keep in mind...

PavlovsCat · on Nov 28, 2018

There's probably lots of room for optimization, for example currently it sets and resets all sorts of GPU state on each print call, for each character, which is rather expensive:

https://github.com/liamg/aminal/blob/master/glfont/font.go#L...

Speaking of the main loop, for every columm and every row:

    cx := uint(gui.terminal.GetLogicalCursorX())

which in turn calls another function -- move that outside of the loop and do it once. Generally buffer values that don't change and get used in loops, instead of fetching them by calling functions on each iteration.

Also, while it may not matter much in this case unless there are other sources of GC churn, just because I noticed it: don't create an array for the vertices for every character to then throw it away and recreate it, have one and keep reusing that.

eru · on Nov 28, 2018

If you have good generational GC (with relocation), then creating and destroying short-lived objects should be almost free?

In practice, you will be re-using the same areas of memory.

kibwen · on Nov 28, 2018

Go's GC is deliberately non-relocating, which precludes being generational. https://blog.golang.org/ismmkeynote The repeated state in question might be benefiting from escape analysis, but I can't tell from a cursory examination.

PavlovsCat · on Nov 28, 2018

Sure, but almost free GC is free in the same way throwing away 3 cents is almost free. Once it adds up enough to cause lost frames, anyway. Just like a straw practically weighs nothing, except when it's the straw that causes the camel to miss a frame.

In this case, I also doubt it would make a difference by itself, but I haven't looked at all of the source, so I just mentioned it as something to maybe watch out for. Also, every bit of GC you can super easily avoid "buys" you room for GC that would make the code more complicated to avoid. Waste not, want not :)

eru · on Dec 9, 2018

Ok, drop the 'almost'. GC can be as 'free' as eg stack allocation (which nobody seems to mind to much).

See the comment by kibwen (https://news.ycombinator.com/item?id=18551218) for some good pointers to more concepts.

hairui · on Nov 28, 2018

I've been using alacritty, another new-ish GPU accelerated terminal emulator, and I can confirm that its performance is better than gnome-terminal or similar. So at least there is something to this idea.

amelius · on Nov 28, 2018

Yes, its README explicitly says:

> Alacritty is the fastest terminal emulator in existence. Using the GPU for rendering enables optimizations that simply aren't possible in other emulators.

I wonder though why people need fast terminal emulators (?) I'm using "xterm" and I haven't found any issues with speed.

mtrovo · on Nov 28, 2018

I'm using only alacritty for the past year and at least on MacOS the rendering latency is in another level. The response is super fast while scrolling through text on less or vim, changing windows on tmux seems to happen instantly, even typing seems to happen without a delay compared to iTerm or Terminal.

One thing that I thought would justify this is that they don't support scrolling text by themselves and to have it you're required to rely on tmux or screen.

wezm · on Nov 28, 2018

Alacritty supports scrolling as of version 0.2.0 https://github.com/jwilm/alacritty/releases/tag/v0.2.0

jstimpfle · on Nov 28, 2018

Coincidentally xterm is the fastest I know, in particular if you use bitmap fonts. Gnome-Terminal for instance feels rather sluggish.

craftyguy · on Nov 28, 2018

A while back LWN did a great comparison of terminal emulators, performance was one aspect they looked at: https://lwn.net/Articles/751763/

jstimpfle · on Nov 28, 2018

Always nice to see one's own subjective experiences backed up by hard measurements :-)

rancor · on Nov 28, 2018

In my experience, urxvt is both faster and uses less memory than xterm by a good margin, although there's not such an appreciable difference on a modern system. Gnome Terminal, on the other hand, is beyond slow, far worse than the still slow but acceptable Konsole.

emmelaich · on Nov 29, 2018

You can read stuff as the text scrolls flying by.

Without acceleration it's a blur.

dzdt · on Nov 28, 2018

I sometimes dump megabytes of text to stdout/stderr as a way of monitoring or debugging long-running computations. Terminal speed matters then.

amelius · on Nov 28, 2018

In those cases, I usually redirect the output to a file, so I can search through it.

Another approach is to run "screen", which has several advantages: (1) not all text needs to be written to the terminal, only the text when you actually look, (2) you can open the computation on a different computer later (e.g. perhaps at home to check if everything is ok), and (3) if you accidentally close the terminal the computation keeps running.

In both cases, my terminal emulator does not need to be fast, really.

My biggest issue with speed in the terminal comes from network latency (which is difficult to fix).

eru · on Nov 28, 2018

tmux is a modern alternative to screen.

Oh, and while you are at it, have a look at mosh as well. Mosh bills itself as the ssh alternative for mobile, intermittent connections but it does take the idea of 'not all text needs to be written to the terminal, only the text when you actually look' even further.

Mosh also has lots of network latency hiding tricks up its sleeve.

emmelaich · on Nov 29, 2018

I bet you could implement a fast-enough terminal in QUIC, and keep the idea of non-permanent connection.

sime2009 · on Nov 28, 2018

Rendering speed shouldn't matter in this case because the terminal shouldn't be trying to render every single line which is sent to it. The terminal should just process the stream and commands, and real updates to the screen.

bitwize · on Nov 28, 2018

If you pre-render the glyphs to a texture, you can draw them to the display as fast as your GPU can go. Changes in font size or face mean you'd have to pre-render them all over again, but that's still not terrible.

bpye · on Nov 28, 2018

With a fixed width font in a single size surely caching glyphs goes a long way? Most of the characters probably come from a small set of characters, and obviously a cache can support full unicode. I guess you couldn't do subpixel hinting but an alpha blended character can be tinted any colour for FG/BG.

jchw · on Nov 28, 2018

You can do caching with any renderer, GPU or not. Almost any text renderer caches glyphs to some degree. Of course, even fixed width fonts can have things like ligatures and composite characters which complicate matters. Rather than caching individual code points, you'd really need to cache something more like individual grapheme clusters.

Subpixel rendering is tricky, but should be possible to do with shaders. You could just render to a normal single channel texture at 3x the horizontal spatial resolution (basically, stretched 3x horizontally) then when rendering move 1 pixel across the glyph rendering for each subpixel, alpha blending individually.

bpye · on Nov 28, 2018

Ah true. I guess I personally do not use a font with ligatures so it didn't really come to mind.

Subpixel renderering with shaders is a neat idea, is it something that has been done before?

jchw · on Nov 28, 2018

I don't know; I assume yes because it seems possible and I doubt I'm the very first person to think about it.

bitwize · on Nov 28, 2018

Don't forget, some fonts (particularly emoji) have multicolor glyphs!

drb91 · on Nov 28, 2018

> You still have to render glyphs individually, almost always on CPU.

How do you figure? As I imagine it you would stream the buffer to the GPU and render it with a pixel shader. Even if layout or glyph calculation is done on the CPU it should be highly cacheable.

jchw · on Nov 28, 2018

GPU acceleration doesn't change the dynamics of caching glyphs. Existing text renderers already cache aggressively. Also, worth noting that caching the rendering of individual code points won't work effectively even in all cases for a terminal, because of diacritics, ligatures, etc.

Of course it's all pixels. You can do it all in pixel shaders. But of course, it's a lot more complicated than it seems. Supporting RTL requires some pretty advanced layout logic. Supporting OpenType ligatures also requires some pretty complicated, stateful logic. And you probably want to support "wide" glyphs even for a fixed width font, which are present in terminals where you are dealing with, for example, kanji.

If you want subpixel AA, that's another complicated issue. If you want to be able to do subpixel AA where glyphs are not locked to the pixel grid, you will need to do more work.

If you want to be able to render glyphs on the GPU purely, you'll need to upload all of the geometries in some usable form. Most GPUs don't render curves, so you will probably turn them into triangles or quads. That's a lot of work to do and memory to utilize for an entire font of glyphs.

You also might think you could utilize GPUs to perform anti-aliasing, but the quality will be pretty bad if done naively, as GPUs don't tend to take very many samples when downsampling.

Since a lot of the work is stateful and difficult to parallelize, doing it on CPU will probably be faster, that way you only pay the latency to jump to the GPU once.

drb91 · on Nov 28, 2018

> Since a lot of the work is stateful and difficult to parallelize, doing it on CPU will probably be faster, that way you only pay the latency to jump to the GPU once.

You can still easily cache the glyphs post processed, especially if you don’t use subpixel AA. There isn’t that much state to a scrollback buffer post glyph processing.

I don’t get the resistance to this type of rendering when at this point there are at least three major monospace glyph rendering libraries implemented for the GPU, and I bet there are dozens I don’t know about.

jchw · on Nov 28, 2018

No such resistance here; I've written text renderers myself. I'm just pointing out that it's not simple and there aren't trivial performance gains. Like I said, you can't really just cache codepoints. The way this particular terminal emulator does it, it's keeping a cache of individual codepoints. Even forgetting OpenType ligatures, this also won't work for things like diacritics.