Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
57 Small Programs that Crash Compilers (regehr.org)
125 points by hedgehog on April 3, 2012 | hide | past | favorite | 13 comments


Small program #58: http://llvm.org/bugs/show_bug.cgi?id=10604

Unlike the others, this one was actually found in production.


I wonder what he's doing that csmith[1] doesn't do. I quickly skimmed the paper[2], but nothing jumped out at me. In fact it looks like some of the reducers are implemented as plugins on top of csmith. I guess I'll have to read the whole thing later.

1. http://embed.cs.utah.edu/csmith/

2. http://www.cs.utah.edu/~regehr/papers/pldi12-preprint.pdf


I wonder what he's doing that csmith doesn't do.

The contribution is techniques for reducing the size of test cases, including test cases that might be generated by csmith (you realize the author of the blog post is one of the csmith authors, right?). From the PDLI'12 paper:

Using randomized differential testing, Csmith automates the construction of programs that trigger compiler bugs. These programs are large out of necessity: we found that bug-finding was most effective when random programs’ average size was 81 KB. In this paper, we use 98 bug-inducing programs created by Csmith as the inputs to several automated program reducers...


Csmith is a random C program generator, which is useful for finding unknown compiler bugs. Creduce is a test case minimiser for reducing a large C file found to trigger some bug to a minimal test case. The input can be either real-world code or randomly generated by Csmith.

Regehr's team is behind both tools.


Wow... amazing! ;) Only one crash for ICC, is it so better or just less analyzed?


My experience working with compiler like products is that susceptibility to crashes is related to choice of data structure for AST/IR. If you allow "weird" stuff in your AST, some component won't handle it well and crash. If your AST is strongly typed (but less flexible) this is less of a problem.

As a concrete example, EDG (used by ICC) has a strongly typed AST. GCC's AST consists of a single type (tree), which allows you to build absurd trees. You could represent the equivalent of

    int x = goto struct { while (1); }
because the data type doesn't prohibit a goto target that happens to be a struct. This will probably explode when it gets to some later compiler stage. If you have distinct goto_node and label_node types, the compiler is less likely to accidentally create such monstrosities. You don't usually get such trees directly from the parser, but from some middle transformation pass.


I quote: the stronger is the theory behind, the stronger is the software.


a third option is less in development. GCC and especially clang these days seem to have some pretty heavy dev going on. It can introduce weird bugs/regressions like these. Conversely, if ICC has been pretty static lately than these bugs, once found and fixed probably stay that way.


This is basically right. First of all, the GCC ones are all GCC versions that are at least 4 years old

(The clang versions are from fairly early on in clang's development, but i dont' remember how old)

All that said, ICC uses a frontend from a company called EDG. They produce C and C++ frontends. They are a 5 person company, but produce very thorough, very well tested, and very well documented frontends. It is not surprising that they are difficult to crash. Language frontends is all they do.


About half the bugs I find in commercial compilers get blamed on the EDG frontend.


Blamed by who? Commercial compiler support? I'd blame the part i didn't make too :)

You can usually figure out if this is true by testing it out against Comeau's online C/C++ compiler. Or at least, you could. Comeau seems not to have updated in a while

If it crashes, it was probably EDG. If it doesn't, it wasn't

The ICC example posted does not crash Comeau, so i doubt it was an EDG issue.

(They are also missing a semicolon on line 2, which EDG doesn't crash on. If you fix this, it still doesn't crash EDG) In fact, all of the programs they posted that contain structs have the same issue. The last member of every struct is missing a semicolon.


Blamed by the compiler vendor. Most of these bugs were fixed when the vendor updated to a newer EDG version.

Different compilers use different EDG versions, so a bug in one might not be present in another.


The only bugs I recall finding in EDG were GCC incompatibilities, where they claimed to support various GCC extensions, but sometimes didn't. If the input had actually been legal C, there wouldn't have been a problem. [We did find a lot of these, as GCC accepted some absurd syntax combinations. At the time, EDG told us we accounted for more than half of their support requests.]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: