Yeah, I firmly support testing, but if the code you are working on isn't explici...

dylan604 · on Dec 4, 2018

I have been known to write shitty code, but am always looking to make the next set of code less shitty or even redo something I'm working on if the time allows. I'm getting better, but have a long way to go. I have a feeling that my code would be bad to write tests against. What would entail making code easier to test against? Small bits of code meant to be included that does one specific thing? Wrapping that included coded into functions and/or objects and then creating a test suite to hit all methods etc?

virgilp · on Dec 4, 2018

I'm not a real proponent of TDD, but... try writing some tests first, when you develop a new feature. Or at least, write them sooner - before your implementation is "done"; write them when you think you figured out the design. See, tests are a new use of your API/interfaces - if you find it hard to test stuff that's important, maybe you didn't model the problem right?

Rules of thumb:

- Don't test implementation, test business workflows ("functional/component tests" are most important; unit tests are sometimes/often nice, but don't overdo it! If a simple refactoring breaks lots of tests, you're doing it wrong - testing implementation detail, not business logic); e2e tests are good and required, but are often slow and when they fail they don't necessarily always isolate the problem very well

- Seek a functional coding style, once you get used you'll find out it is easier to test and easier to reason about (no state means you just test the logic, and it's easy to unit-test too)

- Largely ignore code coverage (use it as a guideline to see whether there are important parts of your app that you ignored/ whether you forgot to add tests for some business workflows/ corner cases).

- Avoid test hacks like calling private methods via reflection or whatnot. Remember, tests exercise your APIs - you either have the APIs wrong, or you're trying to test irrelevant implementation details.

- Look for invariants, and test those. Things that should always be true. Often times, there are multiple acceptable results - avoid exact-match tests when that happens (e.g. if you make a speech-to-text system, don't test that audio clip X produces exact output Y; often times, a genuine improvement in the algorithm might break some of your tests).

TBH I think correct testing is much harder than the industry gives it credit for. Maybe that's why it's so rarely encountered in practice :)

jdmichal · on Dec 5, 2018

I'll take umbrage with your last point:

"Look for invariants, and test those. Things that should always be true. Often times, there are multiple acceptable results - avoid exact-match tests when that happens (e.g. if you make a speech-to-text system, don't test that audio clip X produces exact output Y; often times, a genuine improvement in the algorithm might break some of your tests)."

No, you should test for exactly the output you have coded to generate. Otherwise, you do not know when you have behavior regression. You would expect to have to update the text-to-speech tests when you modify the text-to-speech algorithm. But if you're modifying another algorithm, and you start seeing tests break in the text-to-speech algorithm, you're probably introducing a bug!

A failed test means nothing other than the fact that you have changed behaviour -- and should therefore trigger on any behavioural change. It's your opportunity to vet the expectations of your changes against the actual behaviour of the changed system.

virgilp · on Dec 6, 2018

I respectfully disagree. A test should fail, ideally, only when behavior change is undesirable (i.e. contracts are broken). Optimizations, new features etc. should not break existing tests, unless old functionality was affected. And then there's the whole thing about separating functional from performance concerns - even degraded performance shouldn't fail the functional tests.

In fact, the example I gave was real-life - a friend from Google changed their speech recognition tests to avoid exact matches and it was a significant improvement in the life & productivity of the development team.

[edit] There's also another damaging aspect of exact-match tests: they often test much more than what's intended. Take for instance a file conversion software (say from PDF to HTML). You add a feature to support SVG, and test it with various SVG shapes - it's easy & tempting to just run the software on an input PDF, check that the output HTML looks right (especially in the relevant SVG parts), and then add an exact-match test. Job done, yay! Except that, you do this a lot, and it will slow you down like hell. Because when it fails in the future, it's very hard to tell why (was the SVG conversion broken? or is it some unrelated thing, like a different but valid way to produce the output HTML?). Do this a lot and you won't be able to trust your tests anymore - any change and 400 of them fail, ain't nobody got time to check in depth what happened, "it's probably just a harmless change, let me take a cursory look and then I'll just update the reference to be the new output".

jdmichal · on Dec 6, 2018

You're building a bit of a straw man. If you have 400 tests that fail with a single behavioural change, why are you testing the same thing 400 times? And you don't need an in depth investigation unless you didn't expect the test to break. And if you did expect the test to break, then you ensure that the test broke in the correct place. If a cursory glance is all you need in order to confirm that, then that's all you need. Tests are there to tell you exactly what actually changed in behaviour. The only time this should be a surprise is if you don't have a functional mental model of your code, in which case it's doubly important that you be made aware of what your changes are actually doing.

In your Google example, would their tests fail if their algorithm regressed in behaviour? If it doesn't fail on minor improvements, I don't see how they would fail on minor regressions either.

virgilp · on Dec 6, 2018

400 is an arbitrary number, but it's what sometimes (often?) happens with exact-match tests; take the second example with the PDF-to-HTML converter, an exact match tests would test too much, and thus your SVG tests will fail when nothing SVG-specific changed (maybe the way you rendered the HTML header changed). Or maybe you changed the order your HTML renderer uses to render child nodes, and it's still a valid order in 99% of your cases, but it breaks 100% of your tests. How do you identify the 1% that are broken? It's very hard if your tests just do exact textual comparison, instead of verifying isolated, relevant properties of interest.

In my Google example, the problem is that functional tests were testing something that should've been a performance aspect. The way you identify minor regressions is by having a suite of performance/ accuracy tests, where you track that accuracy is trending upwards across various classes of input. Those are not functional tests - any individual sample may fail and it's not a big deal if it does. Sometimes a minor regression is actually acceptable (e.g. if the runtime performance/ resource consumption improved a lot).

jdmichal · on Dec 7, 2018

> It's very hard if your tests just do exact textual comparison, instead of verifying isolated, relevant properties of interest.

I think you have this assumption that you never actually specified that exact-match testing means testing for an exact match on the entire payload. That's a strawman, and yes you will have issues exactly like you describe.

If your test is only meant to cover the SVG translation, then you should be isolating the SVG-specific portion of the payload. But then execute an exact match on that isolated translation. Now that test only breaks in two ways: It fails to isolate the SVG, or the SVG translation behaviour changes.

> In my Google example, the problem is that functional tests were testing something that should've been a performance aspect. The way you identify minor regressions is by having a suite of performance/ accuracy tests, where you track that accuracy is trending upwards across various classes of input. Those are not functional tests - any individual sample may fail and it's not a big deal if it does. Sometimes a minor regression is actually acceptable (e.g. if the runtime performance/ resource consumption improved a lot).

... "Accuracy", aka the output of your functionality is a non-functional test? What?

And I never said regressions aren't acceptable. I said that you should know via your test suite that the regression happened! You are phrasing it as a trade-off, but also apparently advocating an approach where you don't even know about the regression! It's not a trade-off if you are just straight-up unaware that there's downsides.

virgilp · on Dec 10, 2018

> That's a strawman

It wasn't intended to be; yes that's what I meant; don't check full output, check the relevant sub-section. Plus, don't check for order in the output when order doesn't matter, accept slight variation when it is acceptable (e.g. values resulting from floating-point computations) etc. Don't just blindly compare against a textual reference, unless you actually expect that exact textual reference, and nothing else will do.

> "Accuracy", aka the output of your functionality is a non-functional test? What?

Don't act so surprised. Plenty of products have non-100% accuracy, speech recognition is one of them. If the output of your product is not expected to have perfect accuracy, I claim it's not reasonable to test that full output and expect perfect accuracy (as functional tests do). Either test something else, that does have perfect accuracy; or make the test a "performance test", where you monitor the accuracy, but don't enforce perfection.

> And I never said regressions aren't acceptable.

Maybe, but I do. I'm not advocating that you don't know about the regression at all. Take my example with speech - you made the algorithm run 10x faster, and now 3 results out of 500 are failing. You deem this to be acceptable, and want to release to production. What do you do?

A. Go on with a red build? B. "Fix" the tests so that the build becomes green, even though the sound clip that said "Testing is good" is now producing the textual output "Texting is good"?

I claim both A & B are wrong approaches. "Accuracy" is a performance aspect of your product, and as such, shouldn't be tested as part of the functional tests. Doesn't mean you don't test for accuracy - just like it shouldn't mean that you don't test for other performance regressions. Especially so if they are critical aspects of your product/ part of your marketing strategy!

jdmichal · on Dec 10, 2018

OK I'm caught up with you now. Yes, I agree with this approach in such a scenario. I would just caution throwing out stuff like that as a casual note regarding testing without any context like you did. Examples like this should be limited to non-functional testing, aka metrics, which was not called at all originally. And it's a cool idea to run a bunch of canned data through a system to collect metrics as part of an automated test suite!

jdmichal · on Dec 5, 2018

There is a single cause of all unit-testing difficulty that I have ever seen: isolation. Functionality must be isolated to be testable.

Examples:

* You don't isolate side-effects, such as file creation. Now you must execute file creation every time you execute that functionality. Even though (I assume) you are not interested in testing the file system, you are now implicitly testing it.

* You don't isolate external dependencies, such as requiring a database connection. Now you can't test without standing up a temporary database.

* You don't isolate logic, such as doing several complicated validations in sequence within a single functionality. Now in order to test any of the validations, you must also arrange to pass all the previous validations.

fhood · on Dec 4, 2018

The issue I run into most often is that a system or set of functions I want to test relies on or resides in a complicated object that requires proper initialization. This means that to run tests I need to manually perform all of this initialization, including spoofing complicated internal data structures and whatnot.

The fix? Better compartmentalization in a lot of cases. Or writing classes with a viable form of default initialization in mind.

csours · on Dec 4, 2018

Yes I think this is the problem. The attitudes I mentioned come from people who tried testing and got fed up because of the many legitimate issues they had.