Tests that use random input data are much more difficult to write correctly. You...

chriswarbo · on Sept 2, 2020

Tests with random inputs can be much easier to write. For example here's a test for a single hard-coded example:

    public bool canSetEmail() {
      User u = new User(
        "John",
        "Smith",
        new Date(2000, Date.JANUARY, 1),
        new Email("john@example.com"),
        Password.hash("password123", new Password.Salt("abc"))
      );
      String newEmail = new Email("smith@example.com");
      u.setEmail(newEmail);
      return u.getEmail().equals(newEmail);
    }

Phew! Here's a randomised alternative:

    public bool canSetEmail(User u, Email newEmail) {
      u.setEmail(newEmail);
      return u.getEmail().equals(newEmail);
    }

Not only is the test logic simpler and clearer, it's much more general. As an added bonus, when we write the data generators for User, Email, etc. we can include a whole load of nasty edge cases and they'll be used by all of our tests. I've not used Java for a while, but in Scalacheck I'd do something like this:

    def genEmail = {
      val genStr = arbitrary[String]
        .map(_.replace("@", ""))
        .retryUntil(_.nonEmpty)

      for {
        user   <- genStr
        domain <- genStr
        tlds   <- Gen.listOf(genStr)
      } yield (user + "@" + (domain :: tlds).mkString("."))
    }

Even using the default String generators, this still gives pretty good examples, e.g.

    scala> genEmail.sample
    res1: Option[String] = Some(뀩貲㳱誷Ⓣ壟獝鈂ᗘ䮕鹍尛斿績線孞왁궽偔ቫ﯃쥑瞴䒨䏵艥꿊狿냩쩈簋㷡泪伺鑬鮘䦚벛妹乘饡㙧ꐤ큫ᯕ肳铬呢ூ靪୒틯鏱滄㧄Ｐ莶寕상䐀鹭ᚿ㌤ᇹ䶂攔ꅥ㶓⽌ꅂ뀏뛶盏⍸㇗ɧ⇽줓嵛@퇐䎁鱎馉琿䏍㔰蘿⠶㘮큵휕炠᭑㯠ꇷ氕瑤镦碋䓯鄓헛㥝籆찷舊⸦䁜⯞௵籋㟨㨴鯅鿸刡㽴ﮈ耾刏碓辁亁ᦨ氦ব꧓ꌝ飖䒱찮刍ฤ﯉⯛ஃ颰溯쨚ই媐䄇延醴熜䢐필㉕徫澬폁횱坠줊埬າ⟂.὆蹵箅䀈떥喹뭡㘺ꐟ⯉쉂㯻牏䢺梘쭸칹总⻎恵꣥ﴈ宎嶲ꎮ쌊䯧挤ꓯ⻯ꢠ鏿㰔操駚樅졇੩풻殜ᶝ팱.瘽៣뿔뱡䞿愝滠蛊妏뷩먰⦹挸긖᩿㣽ﮀ嫶ꚸ㗃鈴䇡⋥梏ڝ晡烡.줼欺ࠞ娫)

The other advantage is that automated shrinking does a pretty good job of homing in on bugs. For example, if our test breaks when there's no top-level domain (e.g. 'root@localhost') then (a) that will be found pretty quickly, since 'tlds' will begin with a high probability of being empty and (b) the other components will be shrunk as far as possible, e.g. 'user' and 'domain' will shrink down to a single null byte (the "smallest" String which satisfies out 'nonEmpty' test); hence we'll be told that this test fails for \0@\0 (the simplest counterexample). We'll also be given the random seeds used by the generators.

slavik81 · on Sept 2, 2020

That generator function illustrates exactly the problem I'm talking about. The maximum length of a string in Java is 2^31-1 code points. If user is an 'arbitrary string', then it could be 2^31-1 code points long. If domain is also an arbitrary string, then it can also be 2^31-1 code points long. When you concatenate them and exceed the maximum string length, you will cause a failure in the test code.

There are almost always constraints within the test data, but they're complex to properly express, so they aren't specified. Then one day, the generator violates those unstated constraints, causing the test to fail.

chriswarbo · on Sept 2, 2020

> one day, the generator violates those unstated constraints, causing the test to fail

Good, that's exactly the sort of assumption I'd like to have exposed. As a bonus, we only need to can fix this in the generators, and all the tests will benefit. I've hit exactly this sort of issue with overflow before, where I made the mistaken assumption that 'n.abs' would be non-negative.

In this case Scalacheck will actually start off generating small/empty strings, and try longer and longer strings up to length 100.

This is because 'arbitrary[String]' uses 'Gen.stringOf':

https://github.com/typelevel/scalacheck/blob/master/src/main...

'Gen.stringOf' uses the generator's current "size" parameter for the length:

https://github.com/typelevel/scalacheck/blob/master/src/main...

The "size" of a generator starts at 'minSize' and grows to 'maxSize' as tests are performed (this ensures we check "small" values first, although generators are free to ignore the size if they like):

https://github.com/typelevel/scalacheck/blob/master/src/main...

We can set these manually, e.g. via a config file or commandline args, but the default is minSize = 0

https://github.com/typelevel/scalacheck/blob/master/src/main...

and maxSize = 100

https://github.com/typelevel/scalacheck/blob/master/src/main...

asgard1024 · on Sept 2, 2020

> Tests that use random input data are much more difficult to write correctly.

Interestingly, I personally find them easier to write. I actually find classic unit tests hard to write, probably because I am painfully aware of the lack of coverage.

While with property-based testing, I start from the assumption I have on what the code should do. Then the test basically verifies this assumption on random inputs.

Doing unit test with the given input seems to me backwards - it's like a downgrade, because I always start from what kind of assumption I have and based on this I choose the input. And why not encode the assumption, when you already have it in your mind anyway?

slavik81 · on Sept 2, 2020

Your implementation is necessarily complex. That's why it may have bugs, and why it needs tests.

You have many more tests than implementations. In my experience, ~20x more. If your tests had bugs at the same rate as your implementation, you'd spend 95% of your time fixing test bugs and 5% fixing implementation bugs. That's why tests should be simple.

If you're going to be spending that much time on validating assumptions, I think you're better off trying to express them formally.

asgard1024 · on Sept 3, 2020

I think I disagree, but it really depends what you mean by "test" or "test case". I assume that test case is for a given input, expect certain output, and test verifies certain assumption, such as for a certain class of inputs you get a certain class of outputs.

I believe that you always test two implementations. For example, if I have a test case for a function sin(x), then I compare with the calculator implementation, from which I got the result. So if the tests are to be comprehensive (and automatically executed), then they have to be another implementation of the same program, you can't avoid it, and you can't avoid to (potentially) have bugs in it.

Now, the advantage is that the test implementation can be simpler (in certain cases); or can be less complete, which means less bugs, but also (in the latter case), less comprehensive testing.

In any case, you're validating the assumptions. The assumptions come from how the test implementation works (sometimes it is just in your head). And to express them formally, of course, that's the whole point.

For example, if you're given an implementation of sin(x) to test with, you can express formally the assumption that your function should give a similar result.

By formalizing this assumption, you can then let the computer create the individual test cases; it is a superior technique than to write test cases by hand.