Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While a GPU accelerated terminal emulator sounds great on paper, do keep in mind that GPU acceleration doesn't really give you that much in terms of raw performance. You still have to render glyphs individually, almost always on CPU. So, the reality is probably more that GPU acceleration gives you more flexibility, rather than giving you performance. Anecdotally, I just ran this and it takes roughly twice as long to slog through my dmesg as gnome-terminal.

That said... pretty cool. I don't know if there already existed VT100 emulation written in Go, but it doesn't hurt to have that. Plenty of applications might want to embed a terminal or otherwise have terminal emulation.



There's probably lots of room for optimization, for example currently it sets and resets all sorts of GPU state on each print call, for each character, which is rather expensive:

https://github.com/liamg/aminal/blob/master/glfont/font.go#L...

Speaking of the main loop, for every columm and every row:

    cx := uint(gui.terminal.GetLogicalCursorX())
which in turn calls another function -- move that outside of the loop and do it once. Generally buffer values that don't change and get used in loops, instead of fetching them by calling functions on each iteration.

Also, while it may not matter much in this case unless there are other sources of GC churn, just because I noticed it: don't create an array for the vertices for every character to then throw it away and recreate it, have one and keep reusing that.


If you have good generational GC (with relocation), then creating and destroying short-lived objects should be almost free?

In practice, you will be re-using the same areas of memory.


Go's GC is deliberately non-relocating, which precludes being generational. https://blog.golang.org/ismmkeynote The repeated state in question might be benefiting from escape analysis, but I can't tell from a cursory examination.


Sure, but almost free GC is free in the same way throwing away 3 cents is almost free. Once it adds up enough to cause lost frames, anyway. Just like a straw practically weighs nothing, except when it's the straw that causes the camel to miss a frame.

In this case, I also doubt it would make a difference by itself, but I haven't looked at all of the source, so I just mentioned it as something to maybe watch out for. Also, every bit of GC you can super easily avoid "buys" you room for GC that would make the code more complicated to avoid. Waste not, want not :)


Ok, drop the 'almost'. GC can be as 'free' as eg stack allocation (which nobody seems to mind to much).

See the comment by kibwen (https://news.ycombinator.com/item?id=18551218) for some good pointers to more concepts.


I've been using alacritty, another new-ish GPU accelerated terminal emulator, and I can confirm that its performance is better than gnome-terminal or similar. So at least there is something to this idea.


Yes, its README explicitly says:

> Alacritty is the fastest terminal emulator in existence. Using the GPU for rendering enables optimizations that simply aren't possible in other emulators.

I wonder though why people need fast terminal emulators (?) I'm using "xterm" and I haven't found any issues with speed.


I'm using only alacritty for the past year and at least on MacOS the rendering latency is in another level. The response is super fast while scrolling through text on less or vim, changing windows on tmux seems to happen instantly, even typing seems to happen without a delay compared to iTerm or Terminal.

One thing that I thought would justify this is that they don't support scrolling text by themselves and to have it you're required to rely on tmux or screen.


Alacritty supports scrolling as of version 0.2.0 https://github.com/jwilm/alacritty/releases/tag/v0.2.0


Coincidentally xterm is the fastest I know, in particular if you use bitmap fonts. Gnome-Terminal for instance feels rather sluggish.


A while back LWN did a great comparison of terminal emulators, performance was one aspect they looked at: https://lwn.net/Articles/751763/


Always nice to see one's own subjective experiences backed up by hard measurements :-)


In my experience, urxvt is both faster and uses less memory than xterm by a good margin, although there's not such an appreciable difference on a modern system. Gnome Terminal, on the other hand, is beyond slow, far worse than the still slow but acceptable Konsole.


You can read stuff as the text scrolls flying by.

Without acceleration it's a blur.


I sometimes dump megabytes of text to stdout/stderr as a way of monitoring or debugging long-running computations. Terminal speed matters then.


In those cases, I usually redirect the output to a file, so I can search through it.

Another approach is to run "screen", which has several advantages: (1) not all text needs to be written to the terminal, only the text when you actually look, (2) you can open the computation on a different computer later (e.g. perhaps at home to check if everything is ok), and (3) if you accidentally close the terminal the computation keeps running.

In both cases, my terminal emulator does not need to be fast, really.

My biggest issue with speed in the terminal comes from network latency (which is difficult to fix).


tmux is a modern alternative to screen.

Oh, and while you are at it, have a look at mosh as well. Mosh bills itself as the ssh alternative for mobile, intermittent connections but it does take the idea of 'not all text needs to be written to the terminal, only the text when you actually look' even further.

Mosh also has lots of network latency hiding tricks up its sleeve.


I bet you could implement a fast-enough terminal in QUIC, and keep the idea of non-permanent connection.


Rendering speed shouldn't matter in this case because the terminal shouldn't be trying to render every single line which is sent to it. The terminal should just process the stream and commands, and real updates to the screen.


If you pre-render the glyphs to a texture, you can draw them to the display as fast as your GPU can go. Changes in font size or face mean you'd have to pre-render them all over again, but that's still not terrible.


With a fixed width font in a single size surely caching glyphs goes a long way? Most of the characters probably come from a small set of characters, and obviously a cache can support full unicode. I guess you couldn't do subpixel hinting but an alpha blended character can be tinted any colour for FG/BG.


You can do caching with any renderer, GPU or not. Almost any text renderer caches glyphs to some degree. Of course, even fixed width fonts can have things like ligatures and composite characters which complicate matters. Rather than caching individual code points, you'd really need to cache something more like individual grapheme clusters.

Subpixel rendering is tricky, but should be possible to do with shaders. You could just render to a normal single channel texture at 3x the horizontal spatial resolution (basically, stretched 3x horizontally) then when rendering move 1 pixel across the glyph rendering for each subpixel, alpha blending individually.


Ah true. I guess I personally do not use a font with ligatures so it didn't really come to mind.

Subpixel renderering with shaders is a neat idea, is it something that has been done before?


I don't know; I assume yes because it seems possible and I doubt I'm the very first person to think about it.


Don't forget, some fonts (particularly emoji) have multicolor glyphs!


> You still have to render glyphs individually, almost always on CPU.

How do you figure? As I imagine it you would stream the buffer to the GPU and render it with a pixel shader. Even if layout or glyph calculation is done on the CPU it should be highly cacheable.


GPU acceleration doesn't change the dynamics of caching glyphs. Existing text renderers already cache aggressively. Also, worth noting that caching the rendering of individual code points won't work effectively even in all cases for a terminal, because of diacritics, ligatures, etc.

Of course it's all pixels. You can do it all in pixel shaders. But of course, it's a lot more complicated than it seems. Supporting RTL requires some pretty advanced layout logic. Supporting OpenType ligatures also requires some pretty complicated, stateful logic. And you probably want to support "wide" glyphs even for a fixed width font, which are present in terminals where you are dealing with, for example, kanji.

If you want subpixel AA, that's another complicated issue. If you want to be able to do subpixel AA where glyphs are not locked to the pixel grid, you will need to do more work.

If you want to be able to render glyphs on the GPU purely, you'll need to upload all of the geometries in some usable form. Most GPUs don't render curves, so you will probably turn them into triangles or quads. That's a lot of work to do and memory to utilize for an entire font of glyphs.

You also might think you could utilize GPUs to perform anti-aliasing, but the quality will be pretty bad if done naively, as GPUs don't tend to take very many samples when downsampling.

Since a lot of the work is stateful and difficult to parallelize, doing it on CPU will probably be faster, that way you only pay the latency to jump to the GPU once.


> Since a lot of the work is stateful and difficult to parallelize, doing it on CPU will probably be faster, that way you only pay the latency to jump to the GPU once.

You can still easily cache the glyphs post processed, especially if you don’t use subpixel AA. There isn’t that much state to a scrollback buffer post glyph processing.

I don’t get the resistance to this type of rendering when at this point there are at least three major monospace glyph rendering libraries implemented for the GPU, and I bet there are dozens I don’t know about.


No such resistance here; I've written text renderers myself. I'm just pointing out that it's not simple and there aren't trivial performance gains. Like I said, you can't really just cache codepoints. The way this particular terminal emulator does it, it's keeping a cache of individual codepoints. Even forgetting OpenType ligatures, this also won't work for things like diacritics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: