Completely agree. People tend to dismiss testing rather than balance the depth o...

spc476 · on March 22, 2012

At work, for one small subsection of the project I'm on, I wrote the regression test. I'm testing a "program" that consists of 56 processes (what I'm testing) across three machines (and requires around four other processes across two machines to stub out some services we require but aren't technically part of what I'm testing). It can take up to half an hour to set up (one of the reasons it's not fully automated is that the third party network stack we rely upon will shut down if there are too many errors) and it takes around four hours to run (except for two test cases that require manual intervention to run properly).

And that's just for the back-end processing (nearly 300k lines of C/C++ code). Unit testing? Okay, for large values of "unit", and most of the "units" being tested require almost as much set up as the entire "program".

Is something wrong with the design? Given the constraints and how the project evolved, I can't see it being any simpler. And I'm somewhat overwhelmed with the thought of testing the frontend (which requires Android phones).

wpietri · on March 22, 2012

My metric here is always finding the most valuable way to use my time long term. In the short term, test automation always seems wasteful. But in the long term, it's great. Solid product, little debugging, and minimal manual QA.

You're in a situation with a lot of legacy code. Testing shapes design, but it sounds like you're trying to retrofit testability onto an existing mess. People cut corners for years, and now it's your problem. That sucks.

In your shoes I'd either start improving it or find a new job. I think life's too short to spend my time doing something a computer could and should be doing.

spc476 · on March 22, 2012

Heh ... it's actually a new project. Yes, the majority of the code is third party software. And while we do have a bit of "legacy code" in the project (in the form of the third party proprietary network stack that literally is in pure maintenance mode) we're mostly working in a legacy system (telephony network) that requires very high degrees of redundancy (hence the number of processes on the number of machines).

And for the most part, I was able to get the regression test for the backend process(es) to run unattended once started (thankfully---I (along with two others) did it once manually and it was horrible). I have no idea of how to do that for the frontend Android cell phone client. Sure, we can run tests on an emulator, but there are issues with the Android emulator (it exhibits different buggy behavior than the the physical hardware) so that only gets you so far. It's an interesting (if somewhat overwhelming) problem.

wpietri · on March 23, 2012

That sounds painful. And it's a regrettably common experience working with older code, code not written with testability in mind.

Regarding Android, Netflix has a nice article about what they do to test everything: http://techblog.netflix.com/2012/03/testing-netflix-on-andro...

Personally, most of my testing code is unit tests. I try to isolate my code from third-party services in a variety of ways. I do have some integration tests that check that it all works together, but those are never as useful or as maintainable as I'd like.

jonstjohn · on March 22, 2012

Would you consider this testing 'unit testing', though? Sounds more like higher-level integration tests with a lot of dependencies. Honest question.

spc476 · on March 22, 2012

I'm not sure. Yes, we can (to some degree) independently test parts, but like I said, each part requires a significant portion of the environment to be up (or simulated). And "unit" testing (that is, testing an individual routine or module) doesn't really make sense given how the code is written (receive a message via SS7---the network stack I mentioned) and convert it to an IP based message). To test the portion that talks to the telephony network requires a telephony network (very hard to mock out---lord knows I would love to) and another major unit we wrote (which is another part I test) to even be testable.

And to test that other part? Well, it requires I mock out the previous unit (or run it), plus three three other parts (one including a cell phone---which is really a simple script at this point). And again, it doesn't really make sense to test individual routines because this takes the translated IP packets from the SS7 module, and makes several queries to other IP based services. So a lot of what's going on is just simple translations (in a multithreaded/multiprocessor environment---more fun!).

jes5199 · on March 22, 2012

I went to an Agile class where the lecturer compared unit tests to double-entry bookkeeping. An accountant doesn't say "oh, I don't need to add up both columns here, I know it's just trivial addition".

Once I got in the habit of writing tests of even the most simple transformations, the code complexity and my test complexity grew at the same rate, so it's much harder to end up with a giant untestable mass.

spc476 · on March 22, 2012

I once spent over a month tracking down a bug (in a different project than the one I mentioned above) that I have a hard time seeing how unit testing would have caught. The program: a simple process (no threads, no multiprocessing) that would, depending on which system it ran, would crash with a seg fault. The resulting core files were useless as each crash was in a different location.

It turned out I was calling a non-re-entrant function (indirectly) in a signal handler (so technically it was multithreaded) and the crash really depended on one function being interrupted at just the right location by the right signal. That's why it took a month of staring at the code before I found the issue. Individually, every function worked exactly as designed. Together, they all worked except in one odd-ball edge case that varied from system to system (on my development system, the program could run for days before a crash; on the production system it would crash after a few hours). The fix was straightforward once the bug was found, but finding it was a pain.

So please, I would love to know how unit tests would have helped find that bug. Yes, it is possible to write code to hopefully trigger that situation (run the process---run another process that continuously sends signals the program handles) but how long do I run the test for? How do I know it passed?

jes5199 · on March 23, 2012

no, unit testing doesn't tell you if your constructs aren't safely composable. So: it will pretty much never find a threading bug, a concurrency bug, a reentrancy bug, etc.

I only know three ways to detect this sort of bug, and they all suck: 1) get smart people to stare at all of your code 2) brute force as many combinations as possible 3) move the problem into the type system of your language so you can do static analysis of the code

ryanbrunner · on March 22, 2012

I don't think even the most hardcore TDD zealots would come anywhere close to claiming that testing is a silver bullet. There will always be cases where you didn't think of a particular edge case, or when some environment-based issue makes covering something in a test impossible. That doesn't negate it's benefits in preventing the 99% percent of bugs that aren't an insanely rare edge case.

wpietri · on March 23, 2012

I don't think you should expect every bug to be caught by unit testing. But where it helps with a problem like that is eliminating a lot of other possible causes of bugs. Debugging something like this is often a needle-in-a-haystack problem, but it's nice if you can rule out most of the hay from the beginning.

In this case, once I discovered the cause of the bug I would have written a unit test that exposed it, probably a very focused one. Then I would have gone hunting for other missed opportunities to test for this, and I imagine my team would have come up with some sort of general rule for testing signal handlers.

yxhuvud · on March 22, 2012

Heh, I wonder if we have the same SS7-stack. Your description sounds disturbingly similar to our experience. Do your stack also have a wait of several minutes before reporting that starting it up went ok? (to the logfile of course. It is to good to actually report that to the console or to have service scripts that can be trusted). That wait is very popular with our testers.

Oh well, at least in our case our signals originate in IP and we only have to check against a HLR, which do have a semidecent mockup.

spc476 · on March 22, 2012

From what I understand, there are only two commercially available SS7 stacks, and the one we use is the better of the two (which I find a frightening thought). So there's a 50/50 chance. I don't know enough of the stack to start it (or restart it) so I can't say for sure if that's how our stack works.

And in our case, our signals originate in SS7 ...

dkersten · on March 22, 2012

I don't agree. I obviously don't know the specifics of your project and I certainly don't always unit test code either (even though I know better - though I do unit test actual important code, just not my own experimental or prototype code), but your comment sounds to me like your trying to rationalize not testing your code (or you are frustrated by the amount of third party code thats making it hard to test..). Maybe it would be too expensive to test...

receive a message via SS7 and convert it to an IP based message. To test the portion that talks to the telephony network requires a telephony network

I worked on an SMS anti spam/fraud system for a few years and we unit tested and simulated everything.

For unit testing we mocked all the network/hardware stuff so that each part of our code could be tested in isolation. I firmly believe that there is no code which cannot be unit tested[1], though obviously some code is easier to unit test than other code.

For more end-to-end simulation, we wrote a test suite that would simulate the SS7 network and allow us to test our system under all kinds of message flows - testing not just that the system worked for each variant of the message flows, but also stress testing and performance testing our system. It worked with raw SS7 messages received from a number of commercial gateways and also with SIGTRAN messages (which are almost the same thing anyway). This worked pretty well for us.

just simple translations

That should be the easiest type of code to test! Pure functional translation is ideal for testing: if I put in X, I expect to get Y back (for a bunch of X/Y pairs).

You mention multiple machines and multithreading - obviously this makes testing pretty damn hard (though unit testing should generally not be too affected), but possibly also more critical since multiprocessing is hard anyway. Anyway, like I said, I don't know your system.

most of the "units" being tested require almost as much set up as the entire "program"

It sounds to me that the design isn't modular enough (by design or by evolution), or the units are much much too large. Each unit should be fairly simple and reasonably self-contained.

[1] Nowadays I do some embedded systems stuff, which at first I considered really hard to unit test, but changed my mind after reading this book: http://pragprog.com/book/jgade/test-driven-development-for-e... If you can abstract away microcontrollers and other hardware for the purpose of testing in an embedded scenario, you can abstract pretty much anything away.