Defer Reference Implementation for C

dgellow · on Oct 1, 2020

In case someone wants the same in C++, the Guidelines Support Library comes with the class "final_action" and the function "finally()".

Check the implementation here: https://github.com/microsoft/GSL/blob/master/include/gsl/gsl....

Example from https://docs.microsoft.com/en-us/cpp/code-quality/c26448?vie...:

    void poll(connection_info info)
    {
        connection c = {};
        if (!c.open(info))
            return;

        auto end = gsl::finally([&c] { c.close(); });
        while (c.wait())
        {
            connection::header h{};
            connection::signature s{};
            if (!c.read_header(h))
                return;
            if (!c.read_signature(s))
                return;
            // ...
        }
    }

I love this pattern, it's a very nice way to have a kind of RAII but with more control and flexibility.

rwmj · on Oct 1, 2020

So much complexity. Just standardize __attribute__((cleanup)) which is already being used by a load of software, is available already in GCC and Clang, and does everything that anyone wants.

neostrauss · on Sept 30, 2020

Presumably GCC (and I believe Clang)'s __cleanup__ attribute provides this functionality already in most cases?

Any platform where Clang and GCC aren't supported is a platform where this style of code shouldn't be used, no?

hsaliak · on Oct 1, 2020

It would still help to standardize..

augustk · on Oct 1, 2020

If I'm not mistaken, the first example is equivalent to the following purely structured code:

  void * const p = malloc(25);
  if (p != NULL) {
     void * const q = malloc(25);
     if (q != NULL) {
        if (mtx_lock(&mut) != thrd_error) {
           mtx_unlock(&mut);
        }
        free(q);
     }
     free(p); 
  }

At least to me, this flow is much easier to understand.

avianes · on Oct 1, 2020

To me, your example is not really easier to understand than the defer version.

The complete flow is immediately apparent in your example, but the "effective" flow (the two malloc and the mutex) is harder to identify.

And now imagine the same thing with 10 or more nested conditions.

With defer or goto, it allows you to mentally split the logic in two parts: on one hand the "effective" algorithm; and on the other hand the resource release.

frankjr · on Oct 1, 2020

I think the difference would be more pronounced if there was actually any code that works with the allocated resources. Imagine you wanted to early return because of some other condition. With your code, you cannot just `return` a value, you need to handle the deallocations, and you would be back to GOTOs in no time.

Maybe a personal preference but I also like to keep my code flat. You are already 3 indentation levels deep without any logic in it.

jzelinskie · on Oct 1, 2020

I see this idea posted with some frequency and the responses are almost always "clang and gcc have compiler intrinsics for this". I'm not a regular C programmer, so this begs the question: why is it that nobody seems to know or use them?

rwmj · on Oct 1, 2020

They are fairly widely used in current software - most things that use glib require them, as well as systemd. They just need to be standardized.

jcelerier · on Oct 1, 2020

if you have access to GCC and Clang then you also have access to C++ constructors / destructors... why would you bother with a non-standard attribute ? If you don't have access to gcc / clang because you're developing for some random board supported only by the Keil C compiler... then you don't have the feature anyways

coldtea · on Oct 1, 2020

I know several codebases that use them...

saagarjha · on Oct 1, 2020

They’re nonstandard and not widely known. I suspect these are both correlated.

saagarjha · on Sept 30, 2020

I hate commenting on this usually, but please please please don't touch letter-spacing if you want people to be able to read your text! Doubly so if these are literally headers and using a fairly ugly, squat font…

jart · on Sept 30, 2020

Looks fine to me. Post a screenshot of your desktop. I'd love to learn more about how Courier New spaced -.1em could be rendered illegibly.

saagarjha · on Sept 30, 2020

This is what the page looks like on my computer: https://i.imgur.com/FX3o2EI.png. I wouldn't call it "illegible" but it's certainly unpleasant to read.

jart · on Oct 1, 2020

I'm sorry my friend. Engineers are imperfect creatures when it comes to graphics design.

saagarjha · on Oct 1, 2020

Thankfully reader mode cleaned it up nicely :)

cozzyd · on Sept 30, 2020

Looks fine for me (Firefox on Linux on a 1440p monitor).

ludocode · on Oct 1, 2020

Is this a serious proposal for a new C language feature? Or is this just an experiment from someone's masters thesis or something? The paper is titled "Proposal for C2x", but this can't possibly be seriously considered. I have so many questions.

In section 1.1, the linearization it gives with goto statements is barely longer than the defer example. They claim defer is better just because of the proximity of the cleanup code? Why not just move the "resources acquired" code to a separate function? You wouldn't even need goto in that case, you could just nest if statements to do the cleanup.

The spec claims defer allocates memory. Why? As far as I know __attribute__((cleanup(fn))) doesn't allocate memory. This defer may exhaust memory, and if so, it will immediately terminate execution of the enclosing guard block with a panic() and DEFER_ENOMEM. So like an exception?

This says exit() or panic() will clean up all guarded blocks across all function calls of the same thread. So basically stack unwinding? Apparently you can recover somewhere with a call to recover()? This is just exceptions by another name. This stack unwinding can't possibly interoperate with existing code that expects error return values.

This claims it's robust because any deferred statement is guaranteed to be executed eventually, and it describes in great detail how it runs defer statements on signals. What if I write an infinite loop, or get a SIGKILL, or yank the power cord? Obviously deferred statements won't be executed.

This says defer is implemented with longjmp. Isn't setjmp/longjmp way too slow for exception handling? C++ compilers haven't done exceptions that way for decades. What happens if I longjmp or goto past a defer statement? This says it just doesn't invoke the defer mechanism and may result in memory leaks or other damage. Does that mean it's undefined behaviour? C++ won't compile a goto past constructors for good reason.

All POSIX error and signal codes have an equivalent prefixed with DEFER_, e.g. DEFER_ENOMEM, DEFER_HUP. This is just in case the system doesn't already have ENOMEM? Doesn't the standard already require that ENOMEM exist? If not, why not just make this feature require that ENOMEM exist? Why depend so much on errno for new core language features when it's basically an ugly artifact of ancient C library functions?

> If C will be extended with lamdas (hopefully in a nearer future)

I wouldn't hold my breath.

tom_mellior · on Oct 1, 2020

You're arguing in bad faith, which the HN rules explicitly ask you not to do.

> Or is this just an experiment from someone's masters thesis or something?

The proposal has seven authors, three of which list industry affiliations and three various academic institutions. You're not required to know that some (all?) of the authors are on the C standard committee to tell that this is very probably a more serious proposal than someone's masters thesis.

> Why not just move the "resources acquired" code to a separate function? You wouldn't even need goto in that case, you could just nest if statements to do the cleanup.

That wouldn't work nicely with jumps out of the separate function. Not just with goto, but imagine the guarded block being a loop body and doing break/continue. The function would have to return some special value to indicate "I would like to break/continue here, please". Possible, but why would that be an improvement over goto for something that is clearly a goto use case that the compiler should handle?

> So basically stack unwinding?

You're saying this as if you had puzzled out the "real meaning" hidden inside this proposal. But the proposal doesn't hide that this is, yes, basically stack unwinding.

> This says defer is implemented with longjmp.

This says that this reference implementation, the goal of which is to allow people to test the ergonomics of the feature, is implemented with longjmp. The proposal itself is written to allow such an implementation, but it doesn't require it.

ludocode · on Oct 1, 2020

I don't believe I'm arguing in bad faith. Of course I think this would be wildly inappropriate for C2x, but I am still genuinely curious about this which is why I spent so much time reading it and writing that post. I do actually want to know the answers to my questions.

Stack unwinding is one of the largest, most complicated to implement, and most controversial features of C++. Google, a company with billions of lines of C++, famously disables exception handling in most or all of their code [1]. Not only do many popular C++ projects disable exceptions, but even C++ compilers themselves disable exceptions in their own implementation [2]!

C became popular in large part because of its simplicity of implementation. Some of these projects historically disabled C++ exceptions because they were slow and bloated, a result of the difficulty of implementing them efficiently. Now that they're fast and less bloated, these projects still can't turn them on because code that relies on stack unwinding is incompatible with code that does not. This proposal just repeats all of the same problems as C++.

It is extremely surprising that such a large group of people would propose such a radical feature addition to C, especially one that is so complicated to implement and that effectively makes code that uses it totally incompatible with old code that doesn't. This is interesting when viewed as a feature of a new language based on C, but the idea of adding this to C is frankly absurd.

[1]: https://google.github.io/styleguide/cppguide.html#Exceptions

[2]: https://llvm.org/docs/CodingStandards.html#do-not-use-rtti-o...

tom_mellior · on Oct 1, 2020

I think these questions about stack unwinding are valuable.

If you really do actually want answers, you should probably, in this order, (a) study the actual proposal in detail and not confuse it with an imperfect reference implementation, (b) check comp.std.c for previous discussions on the topic and maybe ask there, (c) see if there are other previous discussions involving the members who proposed this (maybe the standard committee has some semi-public mailing list or something?) and maybe ask there, (d) contact the email address given in the proposal. In all cases, it's probably a very good idea to stay as civil as you were in this post, not as confrontational as you were above.

I'm stressing that you should study the proposal because some of the things you got hung up before were not properties of the proposal but only of the reference implementation. Besides the longjmp issue, the dynamic allocation issue might be in this category as well. The proposal doesn't mention DEFER_ENOMEM, I think a compiler would have enough information in any case to allocate the needed space on the stack.

andrepd · on Oct 1, 2020

Why is this better than RAII with a destructor/drop being called whenever the block is exited? Also, this mechanism is already present in C via __attribute__(cleanup).

dpedu · on Oct 1, 2020

__attribute__ is a nonstandard GNU feature

dnautics · on Sept 30, 2020

thing is, you'll still want errdefer.

nikki93 · on Sept 30, 2020

Seems like `defer_if` is meant for things like the `errdefer` case: https://gustedt.gitlabpages.inria.fr/defer/#org4ae1e19 There's no implicit "error for this stackframe" stuff in C so it needs to be given a condition I guess.

dnautics · on Oct 1, 2020

thanks, I didn't see that!

jart · on Sept 30, 2020

The C language shouldn't need a defer statement keyword, because it's so trivial to implement using an asm() macro that overwrites the return address.

Using a macro is more succinct:

    const char *s = gc(xasprintf("%s/%s", dir, name));

Than what's being proposed:

    char *s = xasprintf("%s/%s", dir, name);
    defer free(s);

See this x86 reference implementation of defer() and gc(). https://gist.github.com/jart/aed0fd7a7fa68385d19e76a63db687f... That should just work with GCC and Clang. That code is originally from the Cosmopolitan C Library (https://github.com/jart/cosmopolitan) so check it out if you like the Gist and want more!

Please note the macro operates at function call boundaries rather than block scoped. I consider that a feature since it behaves sort of like a memory pool. Having side effects at the block scope level requires changing compilers, the language itself, and it would cause several important gcc optimization passes to be disabled in places where it's used.

comex · on Oct 1, 2020

This is a very neat hack, but it breaks in the presence of:

- Inlining: it will work but the defer will be executed at the end of the caller function, which may not be what you expected.

- Tail call optimization: same issue. (You mention in a comment that you use a dummy asm statement to prevent the call to `__defer` itself from being tail-call optimized, but there's still an issue if one defer-using function tail-calls another defer-using function.)

- Function outlining aka hot/cold splitting (currently implemented in LLVM): arbitrary chunks of a function can be split out into their own functions; if one of those does a defer, the cleanup might be run too early, considerably more dangerous than too late.

- Various CFI (control-flow integrity) implementations that are specifically designed to prevent the return address from being overwritten (by exploits).

- Interprocedural register allocation (-fipa-ra in GCC) if __defer gets inlined or analyzed via link-time optimization: The compiler can make assumptions that functions won't modify certain registers that the ABI would normally allow them to modify; this will be violated if it unexpectedly jumps to __defer. This is fixable by marking __defer as __attribute__((noipa)) or reimplementing in assembly.

- Targeting WebAssembly or BPF or other high-level machines that don't support overwriting return addresses.

- Compilers that don't support inline assembly (MSVC).

EDIT: - Targeting ARM if the compiler happens to stash the return address in an unexpected location. You can't fix this by writing to LR like you suggested; the return address needs to be in LR when you execute the ret instruction, but the compiler doesn't need to keep it there for the whole function, and usually won't. Instead, it will usually save it to the stack frame, and load it back before returning, potentially using LR for completely unrelated purposes in between. So usually you can modify the return address by using __builtin_frame_address just like on x86. But that's an implementation detail; it could decide to keep a copy in another register, and move that to LR when returning. Not sure if any compilers actually do that, though I think I might have seen something like that on PowerPC.

Your approach is also relatively slow, since the cleanup code can't be inlined.

(If you're going to use a GNU extension for inline asm, why not just use the GNU extension __attribute__((cleanup))? It's block-scoped, but it doesn't disable any optimization passes or anything since the compiler knows about it, it's portable, and it doesn't have the problems I mentioned.)

jart · on Oct 1, 2020

Thank you for the thorough response.

Inlining, tailcall, hot/cold: None of these are issues. They don't change the fact that memory passed to gc() will be freed. Worst case scenario is the can gets kicked down the road, which is relatively easy to predict. See https://gist.github.com/jart/5aba7fc72c7b6781dadd5949c289a0b... So long as you're not using this technique to unlock mutexes, you'll be fine.

Developers who are required to use CFI need to reach out to their policymakers for authorization to modify return addresses before using the gc() macro. Folks required to use MSVC can use the existence of the gc() macro as compelling evidence for their bosses on the benefits of switching to GCC or Clang.

I'll take you on your word on IPA. I've added a comment to the Gist making sure folks who use it are aware. Thanks for the awesome info on ARM. That's good to know. Also I believe the code is fast.

I like the gc() macro because it can be used in expressions. I find __attribute__((__cleanup__)) unpleasant since it has strong opinions about how variables and cleanup functions need to be declared.

avianes · on Oct 1, 2020

> So long as you're not using this technique to unlock mutexes, you'll be fine.

This does not sound good.

In addition, it is worth mentioning that your hack will probably break the return stack buffer of the x86 superscalar processors.

While a "defer" implementation would have neither of these two flaws.

No doubt it's a pretty neat hack, but I would be sceptical about using it extensively in place of a "defer".

comex · on Oct 1, 2020

> Inlining, tailcall, hot/cold: None of these are issues. They don't change the fact that memory passed to gc() will be freed. Worst case scenario is the can gets kicked down the road, which is relatively easy to predict. [..] So long as you're not using this technique to unlock mutexes, you'll be fine.

That's true for inlining and tailcall, but not hot/cold, since it can make cleanups execute earlier than expected rather than later. This happens if: (1) some chunk of the function is extracted into a separate function; (2) the chunk contains a call to defer; and (3) some code which is not in the extracted chunk, but executes after it, expects the cleanup to not have run yet. See this example:

https://gcc.godbolt.org/z/1ro1z6

To be fair, Clang does not enable this optimization by default, and it might get replaced by a different implementation of hot-cold splitting [1] which happens to not suffer from this issue (because it operates later in the pipeline, essentially splitting blocks at the assembly level rather than the IR level).

On the other hand, GCC does enable a form of hot-cold splitting by default, via -fpartial-inlining, but it's limited to splitting out suffixes of the function (i.e. regions starting somewhere in the function and including everything in the function that can be executed from then on), rather than arbitrary regions. Therefore, it can't run into the problematic case where non-extracted code runs after extracted code. Still, this is just an implementation limitation that could be lifted in the future.

> I like the gc() macro because it can be used in expressions. I find __attribute__((__cleanup__)) unpleasant since it has strong opinions about how variables and cleanup functions need to be declared.

Fair enough; I do agree on that point. (I wish GCC had a way to either use attribute cleanup with C99 compound literals, or somehow declare variables that live within statement expressions as having their lifetime extended to a surrounding block… Maybe what I really want is a better macro system. Or the native `defer` feature proposed here, but I doubt that will ever happen.)

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-August/144012...

jart · on Oct 2, 2020

Please note the intended use case was this:

    #define gc(x) ({ auto y=x; defer(free, y); y; })
    int *buf = gc(malloc(sizeof(int)));
    if (coldpath) {
        ...

It's a bit of stretch to leak memory doing this:

    int *buf = malloc(sizeof(int));
    if (coldpath) {
        defer(fake_free, buf);
        ...

Your point is otherwise valid. It appears a nonstandard nondefault LLVM extension can break the macro under circumstances that are difficult to imagine happening in practice. Here's why LLVM is wrong. It surgically removes a chunk of code from the middle of a function, turn it an external function, and then emits a synthetic call. It should jump to the cold block and jump back. The amount of code I'm seeing it generate just to handle the abi boundary it needlessly created is almost as large as the chunk that's being outlined.

Godbolt's website doesn't show this, but if you pass -S to clang-10 you'll notice other suboptimalities exist in the way this LLVM extension was implemented. For example, it doesn't emit .section directives to relocate the cold code, which is half the point of hot cold pgo style optimizations, since I recall Google building the thing to not have their web-scale static binaries paging hundreds of megs of cold code off disk.

LLVM is doing great work, but still has so much catching up to do compared to GCC on the code generation. GCC 9+ does cold optimizations by default, but only to relocate cold noreturn error handling paths into .text.unlikely. That's not an issue. I can't say for certain what it does if you generate and pass and optimization profile.

littlestymaar · on Oct 1, 2020

This is a great answer, and this issue highlights pretty well the problem that can arise when people reason about C as if it was just ”portable assembly”.

est31 · on Sept 30, 2020

The original goal of C was to allow writing programs in a platform independent way, i.e. without having to write different code for different arches.

Also, there is no inline assembly support in standard C, just in various compilers.

phkahler · on Sept 30, 2020

>> The original goal of C was to allow writing programs in a platform independent way, i.e. without having to write different code for different arches.

I think it was meant to be a portable language, not to let you write portable code. With the size of standard types being machine dependent you couldn't write completely portable code, but you could write C on a lot of hardware.

It's like how 8 bit computers all had BASIC but weren't compatible. If you knew one it was easy to get going on another because at some level it was all BASIC.

saagarjha · on Sept 30, 2020

Interestingly, C++ does have an asm declaration.

est31 · on Sept 30, 2020

It exists, but the meaning is implementation defined.

https://eel.is/c++draft/dcl.asm

saagarjha · on Sept 30, 2020

While interesting, such a construct is obviously unsuitable for inclusion in the C standard nor can it be relied upon when writing portable code.

jart · on Sept 30, 2020

I never proposed that some code I posted to Gist be part of the C standard. I'm flattered however that folks are considering it for that purpose!

mayoff · on Sept 30, 2020

Nobody's proposing your code for the C standard. The original article is proposing an addition to the standard. Your comment argues that programmers don't need the addition to the standard because there is a non-standard, non-portable hack. That is not a good argument against an addition to the standard.

jart · on Sept 30, 2020

The onus is on the proposer. C is simple and should stay that way. If someone is proposing C needs to be altered (which is about the most conservative language there is when it comes to adopting features) then that person should have good arguments as to why there's no other way. Otherwise it belongs in C++. If whipping up a few lines of asm for my local architecture solves the problem, then that weakens the proposal.

Maybe instead the C language committee should be focusing on standardizing the RMS notation for the asm() keyword which makes this epic hack possible. https://gist.github.com/jart/fe8d104ef93149b5ba9b72912820282...

saagarjha · on Sept 30, 2020

The RMS notation is pretty icky, though, and it of course only works for x86 as well. I actually like Rust's asm macro quite a bit: https://doc.rust-lang.org/unstable-book/library-features/asm....

Your Gist is excellent either way, thanks for compiling it. I'll probably reference it whenever I write inline asm now ;)

jart · on Sept 30, 2020

Glad I could help!

Rust asm syntax looks nice, but the syntax is just the tip of the iceberg. asm() is almost a misnomer. Its true power is the constraints system that lets us control GCC/Clang internal algorithms. You may have noticed that the defer() macro uses asm() with an empty string!

It's such a general tool that's become so prevalent as a practice (since Stallman invented it in the early 90's) that I would surely hope it's on the radar of language committees by now. If these definitions and mnemonics can be formalized or at least clarified by standards bodies, then they should be.

arcticbull · on Oct 1, 2020

I think your implementation is really, really cool. Similarly I spent a good half hour bumbling through Cosmopolitan. Love what you've built.

IMO, the fact that asm() is used for a bunch of things unrelated to inline assembly -- but to your point, instead for changing compiler behaviors is more of an argument for exposing additional __attributes__, annotations and so on, rather than adding explicit support for asm() to the standard. This violates the principal of least surprise, and IMO, serves to further confuse rather than bring some predictability to C.

I'd suggest also that C should embrace some movement in the standard rather than agreeing to leave things as is forever and stapling legs onto the octopus that is C++. There was a good writeup here a while ago from someone involved with the committee, that they've gotten to the point where introducing new warnings for obviously broken behavior is off the table because they want to be warnings-compatible from release to release.

What I'm saying is I would rather see exposed intrinsics, primitives and other meaningful source annotations than codifying the spooky action at a distance of an empty string asm() call in the C standard. And if we're going to modify the standard there's a lot of low-hanging fruit I'd love to see cleaned up first.

[edit] I'd also like to add that I agree with your thesis that if this can be built with the tools provided instead of modifying the standard, that the onus is on the proposer. Based on some of the other analyses here it seems like it can't really be done in a universal way, but I'm open minded.

saagarjha · on Oct 1, 2020

Hmm, that’s actually a very interesting viewpoint on inline assembly, and it’s certainly something that I think may be useful to have. That being said, the ergonomics of actually using it are still kind of poor, even putting aside the strange register names and such, since it’s obviously not really designed to do this. And the other concern I would have is that compilers tend to be fairly conservative when seeing such constructs; I think even with operands specified as much as possible there are still substantial gaps in what you can express to the compiler and also how much the compiler actually cares that you marked a particular register set as being clobbered-it might just spill more or be conservative if it doesn’t want to deal with your constraints. I should probably look to see if compilers treat it better today.

jart · on Oct 1, 2020

You're not alone in feeling that way. There's a very short list of people who've ever taken the time to fully grok Richard Stallman's Math 55 assembly notation, and they mostly work on projects like the Linux Kernel and glibc. It's designed to do anything. I've been using it the past few days to retool the standard x86_64 compiler to generate 16-bit boot code for The LISP Challenge. https://github.com/jart/cosmopolitan/blob/b6793d42d5ed6b4f78...

grok22 · on Sept 30, 2020

I think perhaps this is the wrong way to look at it -- having it as a standard way to do things built into the language simplifies this (not everybody needs to know tricks) and also most likely will be more portable across platforms.

neostrauss · on Sept 30, 2020

Looks extremely cool. Presumably __cleanup__ is more robust in a GCC environment though?

warmwaffles · on Sept 30, 2020

That gist looks interesting. You said it is from cosmopolitan, I assume it has the same license as it. Thinking about toying with that defer implementation. Looks fun.

Though I still do it all manually, and am looking for ways to automate it.

jart · on Sept 30, 2020

The Gist is now updated with an ISC license. So it's very permissive. Enjoy! Feel free to contact me anytime too, if you get interested in hacking on this stuff.

coldtea · on Oct 1, 2020

I don't think "C shouldn't implement X" and "because X is trivial as an asm macro" are two sentences that make sense together.

shoo · on Sept 30, 2020

tangential suggestion: if the the first line of the cosmopolitan README after the title was the description "fast portable static native textmode executable containers" then that would help newcomers more quickly understand what the project is about. i skim-read through the README and was still fairly puzzled about what the purpose of a cosmopolitan was before i saw that description hiding in the margin.

jart · on Sept 30, 2020

Thanks for the feedback! The README has now been updated: https://github.com/jart/cosmopolitan/commit/fd22e55b42503093... It could use more work though. See also https://justine.storage.googleapis.com/ape.html and the HN thread about it last month: https://news.ycombinator.com/item?id=24256883

cozzyd · on Sept 30, 2020

Can this be made to work on ARM as well?

jart · on Sept 30, 2020

Yes, absolutely! In the case of Arm, it would need to be a pure macro (i.e. no external __defer function) since ARM ABI uses a register to store the return address. Then the unwind code would need to save the return registers x0 to x7 each time it calls free() or whatever function is being deferred. I'd write it for you, but I don't use ARM.

liuliu · on Sept 30, 2020

You need to disable inlining. __builtin_frame_address can give surprising addresses when inline enabled. The code itself is likely not be wrong, but you won't confined to the lexical scope as if you see in the code (the defer can be triggered when the parent function returned).

Disclaimer: I haven't looked at the code too closely.

jart · on Sept 30, 2020

Inlining gives you the ability to control how "pooled" the free() operations end up being. You don't need to disable inlining. You just have to be mindful of how much power this macro gives you. For example, if a function that calls gc() is being called from within a loop, then it's a good idea to make sure that function isn't static.

warmwaffles · on Sept 30, 2020

ARM implementation would be fun to look at as well.