One exception to the otherwise solid design principles: "If an error requires pa...

lifthrasiir · on Nov 7, 2018

> It is almost never okay for a library to abort the process.

Using Joe Duffy's distinction [1], bugs aren't recoverable errors. Error handling admits the necessity of error recovery, but you generally don't have such a fine-grained policy for bugs. A dual-use of exceptions as a means to signal bug conditions is unfortunate, and I believe Mundane's policy is clearly following this distinction.

Also, panicking in Rust can be caught in the same thread [2]. So it is actually much more flexible than process abort---if you need a general, umbrella protection against bugs, here you have a way.

[1] http://joeduffyblog.com/2016/02/07/the-error-model/

[2] https://doc.rust-lang.org/stable/std/panic/fn.catch_unwind.h...

blub · on Nov 7, 2018

That's a thought provoking article, and I see no problem with the proposed error management technique as a whole, for that particular project.

This however is a crypto library designed to run on consumer OSs, as part of programs which will likely offer much more functionality that generating random numbers.

In general, a bug is a software fault, which is a passive flaw in the program, introduced by a programmer. Faults can manifest and cause the program to behave in an uninteded way, which in turn can lead to failure - the program can no longer perform its function.

Any point on the fault -> error -> failure chain can be an intervention point. The relevant point for this discussion is the detection of errors and preventing them from becoming system failures, and if not possible, handling those gracefully.

Let's agree not to discuss recovery attempts and assume that the system is instead moved directly to a safe state when such a bug/error is detected.

A crash-only safe state is simple and quite easy to implement, but whether it's the best approach for a particular software depends on the dependability requirements of that software. Abandoning the current operation and returning to the top level execution context is an alternative that shouldn't be so easily dismissed.

In the BoringSSL case, there doesn't seem to be a reason to abort. The error condition is known, can be detected and failure can be returned just as easily. Panicking is also fine, if the parent program can react to it.

lifthrasiir · on Nov 7, 2018

> This however is a crypto library designed to run on consumer OSs, as part of programs which will likely offer much more functionality that generating random numbers.

Are you referring to getrandom(2)? Unless you are using `/dev/random` (i.e. GRND_RANDOM) instead of `/dev/urandom` (which by the way you don't need to use [1]), the only case that getrandom(2) blocks or fails is the very beginning of the machine startup where not enough entropy has been collected. It is not something you would expect to occur more or less randomly.

[1] https://www.2uo.de/myths-about-urandom

> In general, a bug is a software fault, which is a passive flaw in the program, introduced by a programmer. Faults can manifest and cause the program to behave in an uninteded way, which in turn can lead to failure - the program can no longer perform its function.

The OP does explicitly say that it may be justified to make the function's API infallible. They strive to simplify the error case to handle (e.g. verification failure and other recoverable errors are combined to ease the error handling), and they are expected to exercise this right only when there exists no good and reasonable error handling strategy.

By the way, it seems that Mundane actually does not panic but aborts the entire process [2] with a rationale that panic handling in Rust is not as trivial. This decision can be problematic by its own, but I found that aborts are only used to guard against generally improbable error cases, e.g. linking or calling to a different library that happens to provide the same set of symbols as BoringSSL. If you say that this should be caught gracefully, uh, I'd say that you should also guard against an invocation failure due to dynamic linkage failure for the sake of user experience...

[2] https://github.com/google/mundane/blob/8aaa1c8/src/boringssl...

Asooka · on Nov 7, 2018

You can factor out whatever functionality of your system is that uses crypto in a separate process and then the crash is simply "my crypto process died", which you can handle and recover from. I think it's ok to force people to cleanly separate functionality from their main process when its failure doesn't have a meaningful recovery process and a half-assed recovery can lead to catastrophic security problems.

PudgePacket · on Nov 7, 2018

> It is almost never okay for a library to abort the process.

For a library with a design principle to be as hard to misuse as possible it seems like the right decision, and the "exception to the rule".

Infinitely better to crash than to potentially let the process/library run in some kind of degraded state.

If you are aware of this behaviour and are savvy/experiences enough you'll either 1) Catch the panic and perform an appropriate library 2) Use a different library.

blub · on Nov 7, 2018

The issue is that the library can't know what state the program is running in, since it's a library... it has a job to perform some crypto functionality and it can only either return a result or an error for that particular operation.

When libraries start calling abort out of the blue it's like the janitor deciding to send everyone home for the day because their mop broke. It's not their call to make :)

Panicking or any other error handling mechanism which permits the main application to decide how to continue is perfectly fine.

adwn · on Nov 7, 2018

> When libraries start calling abort out of the blue it's like the janitor deciding to send everyone home for the day because their mop broke.

A more apt analogy would be the janitor that pulls the fire alarm because they saw smoke coming out of the boiler room. So yes, it's their call to make, and it would be the right call.

Besides, as lifthrasiir points out, you can isolate this behavior in Rust, akin to triggering the fire alarm for a single building, but not for the entire complex.

IshKebab · on Nov 7, 2018

> Infinitely better to crash than to potentially let the process/library run in some kind of degraded state.

Why? Just set the entire library to "failed" mode and have every function return an error or do nothing from that point forward. That is far more sensible than just panicking and bringing down the entire application.

Imagine if people want to use this in a cash machine or something like that.

heavenlyblue · on Nov 7, 2018

I would much rather the cash machine crashes than starts communicating with the bank seeded by 00*inf bytes of random.

Besides, what exactly do you expect to do with the library in an "exceptional condition"? Do I now need to check the output of every single function for some non-local effects they have on each other?

blub · on Nov 7, 2018

How can the library know that the cash machine will communicate with the bank, or that there even is a cash machine?

The library should just tell the program that it failed go perform its task, not guess at what its parent program could, should or would do.

Note that the discussion started from "abort". Maybe the authors meant something else by abort, but in a system programming context it means calling abort and terminating the program's execution immediately.

If they just meant it as a synonym for panic, we're just having a nice discussion here.