It might be just my impression, but fuzz testing is underrated and not usually considered as something mandatory for a given system. It's setup cost is not high, and gives enough benefits. As long as you have it in place, you can focus on more functional testing, rather than negative scenarios, which are already covered through fuzzing.
And tooling is getting better, moving up the application layer. API spec standards like OpenAPI (and adoption of it) enable fuzzing on APIs which was significantly hard to setup prior to this.
> false positives... 10Mo URL crashes my Python interpreter? I'm ok with that.
Either you're wrong about the "false" part or you're testing the wrong entry point.
Fuzzing has a long warmup time but probably the lowest false positive rate of any testing technique - if it crashes you're wrong about some invariant, even if that's only "I need to fuzz parseX, not decodeX".
I think it just depends on how critical the software is. If it's just a webpage running in your browser then it's not a big deal if it crashes on some weird input. If it's a database, each crash can cause huge amounts of damage.
If you completely understood your system and how it behaved you wouldn't need those fuzzing. Or stress testing.
It's because we build lousy software with lossy interfaces and strangely-undefined interactions that these tools exist. They're used when someone says "we'll never really know what's going on there, just throw some tooling at it and maybe we'll get lucky!"
One would hope we would try to make software more like the former than the latter.
Sorry no. No matter how good you are, unless you’ve formally proven your system correct or it’s a trivial program, your system likely has bugs. Humans aren’t perfect and a program has a combinatorial explosion of states - each condition in your system multiples the number of possible states by 2.
Fuzzing, property testing and mutation testing are attempts to get an inherently complex system under control. Typically, these approaches are also cheaper than formal proofs to develop which is really important because engineering isn’t just building the thing. It’s evolving the thing, maintaining it, and the cost of all of this in manpower is absolutely critical to consider. Think Space Shuttle design where you have to get everything right up front vs SpaceX’s fail fast approach.
> If you completely understood your system and how it behaved you wouldn't need those fuzzing. Or stress testing.
Only in the same sense where if people just wrote perfect bug free code then we would not have to test. Tragically, people are not omniscient or perfect programmers, so tools remain useful.
Fuzzing is just a measure to monitor that you didn't miss any areas. You might guard your code using well defined patterns or behaviours, but you cannot be perfect at it. Fuzzing will make sure you do this consistently.
I think that sets an unnecessarily low upper bound on the complexity of the system.
Did Feynman completely understood the Space Shuttle when investigating the Challenger disaster? Perhaps, but we can't always rely on having a Nobel laureate at hand.
And even then, unfortunately, this was after the analogous "fuzz test" had already failed horrifically. (Not making a dark joke ... I'm pointing out that even the Nobel laureate wasn't allocated and focused in time to prevent the problem, which furthers your argument).
A side benefit of attempting fuzzing is highlighting code built on an anemic domain model. When the code takes some string-y input and stuffs it into types that amount to little more than aggregations of values, fuzzing can't do much. If programmers haven't at least attempted to define what allowable values are for these types, fuzzing can't help identify any problems.
This Go example[1] demonstrates how fuzzing helps precisely define what a valid string is for the purposes of reversing it.
There is mention of Property Based Testing and minimization. That's a very nice technique. The fuzzer may generate various data shapes and types, but then once a failure happens the generated types know how to shrink/minimize to produce the smallest failing test case to present to the user.
The author mentions correctly that for composite generated types, the shrinking behavior can compose too. So, a list of integers might shrink both integer values, to a lower bound, and the list size to have one less element, so you wouldn't need to implement a custom list of integers shrinking behavior. You can just compose the fuzzing generators.
There is an excellent book by Fred Herbert about property testing. It's Erlang based but the general concept applies to other frameworks: https://propertesting.com/book_shrinking.html
I just realized there is a crossover with fuzzing and Reinforcement learning. Both are trying to find a sparse signal in and environment. Both need to find the right balance between exploration vs exploitation. Both need to perform better than random/brute force.