Tests that use random input data are much more difficult to write correctly. Your test needs to know the expected result not just for one case, but for every possible case. That will vastly increase the number of bugs in your test code, leading to a seemingly endless stream of false-positives.
The worst part is that the feedback is late. The test will randomly fail long after it was written. There's a lot of needless overhead in relearning the context of the failing code just so you can fix a test that is not general enough for the data it was provided.
There are ways to effectively use randomly-generated test data, but it's harder than you'd think to do it right.
Tests with random inputs can be much easier to write. For example here's a test for a single hard-coded example:
public bool canSetEmail() {
User u = new User(
"John",
"Smith",
new Date(2000, Date.JANUARY, 1),
new Email("john@example.com"),
Password.hash("password123", new Password.Salt("abc"))
);
String newEmail = new Email("smith@example.com");
u.setEmail(newEmail);
return u.getEmail().equals(newEmail);
}
Phew! Here's a randomised alternative:
public bool canSetEmail(User u, Email newEmail) {
u.setEmail(newEmail);
return u.getEmail().equals(newEmail);
}
Not only is the test logic simpler and clearer, it's much more general. As an added bonus, when we write the data generators for User, Email, etc. we can include a whole load of nasty edge cases and they'll be used by all of our tests. I've not used Java for a while, but in Scalacheck I'd do something like this:
The other advantage is that automated shrinking does a pretty good job of homing in on bugs. For example, if our test breaks when there's no top-level domain (e.g. 'root@localhost') then (a) that will be found pretty quickly, since 'tlds' will begin with a high probability of being empty and (b) the other components will be shrunk as far as possible, e.g. 'user' and 'domain' will shrink down to a single null byte (the "smallest" String which satisfies out 'nonEmpty' test); hence we'll be told that this test fails for \0@\0 (the simplest counterexample). We'll also be given the random seeds used by the generators.
That generator function illustrates exactly the problem I'm talking about. The maximum length of a string in Java is 2^31-1 code points. If user is an 'arbitrary string', then it could be 2^31-1 code points long. If domain is also an arbitrary string, then it can also be 2^31-1 code points long. When you concatenate them and exceed the maximum string length, you will cause a failure in the test code.
There are almost always constraints within the test data, but they're complex to properly express, so they aren't specified. Then one day, the generator violates those unstated constraints, causing the test to fail.
> one day, the generator violates those unstated constraints, causing the test to fail
Good, that's exactly the sort of assumption I'd like to have exposed. As a bonus, we only need to can fix this in the generators, and all the tests will benefit. I've hit exactly this sort of issue with overflow before, where I made the mistaken assumption that 'n.abs' would be non-negative.
In this case Scalacheck will actually start off generating small/empty strings, and try longer and longer strings up to length 100.
This is because 'arbitrary[String]' uses 'Gen.stringOf':
The "size" of a generator starts at 'minSize' and grows to 'maxSize' as tests are performed (this ensures we check "small" values first, although generators are free to ignore the size if they like):
> Tests that use random input data are much more difficult to write correctly.
Interestingly, I personally find them easier to write. I actually find classic unit tests hard to write, probably because I am painfully aware of the lack of coverage.
While with property-based testing, I start from the assumption I have on what the code should do. Then the test basically verifies this assumption on random inputs.
Doing unit test with the given input seems to me backwards - it's like a downgrade, because I always start from what kind of assumption I have and based on this I choose the input. And why not encode the assumption, when you already have it in your mind anyway?
Your implementation is necessarily complex. That's why it may have bugs, and why it needs tests.
You have many more tests than implementations. In my experience, ~20x more. If your tests had bugs at the same rate as your implementation, you'd spend 95% of your time fixing test bugs and 5% fixing implementation bugs. That's why tests should be simple.
If you're going to be spending that much time on validating assumptions, I think you're better off trying to express them formally.
I think I disagree, but it really depends what you mean by "test" or "test case". I assume that test case is for a given input, expect certain output, and test verifies certain assumption, such as for a certain class of inputs you get a certain class of outputs.
I believe that you always test two implementations. For example, if I have a test case for a function sin(x), then I compare with the calculator implementation, from which I got the result. So if the tests are to be comprehensive (and automatically executed), then they have to be another implementation of the same program, you can't avoid it, and you can't avoid to (potentially) have bugs in it.
Now, the advantage is that the test implementation can be simpler (in certain cases); or can be less complete, which means less bugs, but also (in the latter case), less comprehensive testing.
In any case, you're validating the assumptions. The assumptions come from how the test implementation works (sometimes it is just in your head). And to express them formally, of course, that's the whole point.
For example, if you're given an implementation of sin(x) to test with, you can express formally the assumption that your function should give a similar result.
By formalizing this assumption, you can then let the computer create the individual test cases; it is a superior technique than to write test cases by hand.
The worst part is that the feedback is late. The test will randomly fail long after it was written. There's a lot of needless overhead in relearning the context of the failing code just so you can fix a test that is not general enough for the data it was provided.
There are ways to effectively use randomly-generated test data, but it's harder than you'd think to do it right.