Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Random Fuzzy Thoughts (tigerbeetle.com)
74 points by todsacerdoti on March 28, 2023 | hide | past | favorite | 18 comments


It might be just my impression, but fuzz testing is underrated and not usually considered as something mandatory for a given system. It's setup cost is not high, and gives enough benefits. As long as you have it in place, you can focus on more functional testing, rather than negative scenarios, which are already covered through fuzzing. And tooling is getting better, moving up the application layer. API spec standards like OpenAPI (and adoption of it) enable fuzzing on APIs which was significantly hard to setup prior to this.


Fuzzy is expensive.

It's slow to run, and needs a long configuration then calibration time until you don't find false positives anymore.

Also, many bugs found by fuzzing are in the "it's ok to have them" range.

10Mo URL crashes my Python interpreter? I'm ok with that.

Not to say fuzzing is not useful, but it's not cheap, and the benefits really shines for mature software.


> false positives... 10Mo URL crashes my Python interpreter? I'm ok with that.

Either you're wrong about the "false" part or you're testing the wrong entry point.

Fuzzing has a long warmup time but probably the lowest false positive rate of any testing technique - if it crashes you're wrong about some invariant, even if that's only "I need to fuzz parseX, not decodeX".


For one entry point, fuzzing may found 30 ways to crash it, none of which you care about.

A thing is only a bug if it's a problem for use case your users will have an issue with.


I think it just depends on how critical the software is. If it's just a webpage running in your browser then it's not a big deal if it crashes on some weird input. If it's a database, each crash can cause huge amounts of damage.


I think a lot of these types of tools have a high initial investment, but also a high initial return on investment.

The first couple of runs everything breaks, then code is added to NOT catastrophically fail, then you are pretty good.

(or move into lots of false failures)

Sort of like -Werror or linting/static analysis type tools, or code coverage, or similar...


If you completely understood your system and how it behaved you wouldn't need those fuzzing. Or stress testing.

It's because we build lousy software with lossy interfaces and strangely-undefined interactions that these tools exist. They're used when someone says "we'll never really know what's going on there, just throw some tooling at it and maybe we'll get lucky!"

One would hope we would try to make software more like the former than the latter.


Sorry no. No matter how good you are, unless you’ve formally proven your system correct or it’s a trivial program, your system likely has bugs. Humans aren’t perfect and a program has a combinatorial explosion of states - each condition in your system multiples the number of possible states by 2.

Fuzzing, property testing and mutation testing are attempts to get an inherently complex system under control. Typically, these approaches are also cheaper than formal proofs to develop which is really important because engineering isn’t just building the thing. It’s evolving the thing, maintaining it, and the cost of all of this in manpower is absolutely critical to consider. Think Space Shuttle design where you have to get everything right up front vs SpaceX’s fail fast approach.


> If you completely understood your system and how it behaved you wouldn't need those fuzzing. Or stress testing.

Only in the same sense where if people just wrote perfect bug free code then we would not have to test. Tragically, people are not omniscient or perfect programmers, so tools remain useful.


Fuzzing is just a measure to monitor that you didn't miss any areas. You might guard your code using well defined patterns or behaviours, but you cannot be perfect at it. Fuzzing will make sure you do this consistently.


Yes, of course. We should also just drive safer and then we won't need seatbelts.


> If you completely understood your system

I think that sets an unnecessarily low upper bound on the complexity of the system.

Did Feynman completely understood the Space Shuttle when investigating the Challenger disaster? Perhaps, but we can't always rely on having a Nobel laureate at hand.


And even then, unfortunately, this was after the analogous "fuzz test" had already failed horrifically. (Not making a dark joke ... I'm pointing out that even the Nobel laureate wasn't allocated and focused in time to prevent the problem, which furthers your argument).


If you were to just run faster than Usain Bolt, you would not need training.


A side benefit of attempting fuzzing is highlighting code built on an anemic domain model. When the code takes some string-y input and stuffs it into types that amount to little more than aggregations of values, fuzzing can't do much. If programmers haven't at least attempted to define what allowable values are for these types, fuzzing can't help identify any problems.

This Go example[1] demonstrates how fuzzing helps precisely define what a valid string is for the purposes of reversing it.

1. https://go.dev/doc/tutorial/fuzz


There is mention of Property Based Testing and minimization. That's a very nice technique. The fuzzer may generate various data shapes and types, but then once a failure happens the generated types know how to shrink/minimize to produce the smallest failing test case to present to the user.

The author mentions correctly that for composite generated types, the shrinking behavior can compose too. So, a list of integers might shrink both integer values, to a lower bound, and the list size to have one less element, so you wouldn't need to implement a custom list of integers shrinking behavior. You can just compose the fuzzing generators.

There is an excellent book by Fred Herbert about property testing. It's Erlang based but the general concept applies to other frameworks: https://propertesting.com/book_shrinking.html


I built a tool to make fuzzing in rust as pain-free as possible: https://github.com/srlabs/ziggy

It’s basically a wrapper around afl.rs and the honggfuzz rust library, and runs both fuzzers in parallel.


I just realized there is a crossover with fuzzing and Reinforcement learning. Both are trying to find a sparse signal in and environment. Both need to find the right balance between exploration vs exploitation. Both need to perform better than random/brute force.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: