I used to work in aerospace as a hardware/software engineer and while my team did use unit tests that wasn't how software was tested and qualified.
The process in aerospace is vastly slower and that can be signified by the fact that on average a programmer doing aerospace produces just 1000 lines of code a year. There is a clear reason why aerospace engineers produce a lot less code:
- Documentation - You get a pile as tall as your desk for a few thousand lines of code, the software is designed to the degree of knowing precisely what the maximum N can be going into a function and its exact runtime, every function has a fixed maximum and minimum values and a runtime associated with it preallocated before anyone writes code. Then the code is checked against the documentation.
- Bench testing - We had a complete dev/test environment in which all in dev hardware/software could do missions and had all the different teams and their avionics device within a network interacting. You then flew countless "missions" (real cockpit parts but swivel chair + simulator) testing all the different scenarios and what you just did.
- The wider team of engineers working on the avionics software would print out the entire code and all branches and "walk" the entire system from beginning to end with documentation in hand. Every line is validated by someone from a different team and every decision scrutinised.
- System test - multiple phases of. Completely separate teams would test every new release to death rigorously, against the clients specs, against the documentation produced and also with their own independent simulator and cockput setup. Not only that but there was a second team doing the same thing trying to catch the first in missing something.
- Then it spends years being tested in prototype vehicles before finally being signed off as ready.
End to end it took about 10 years to get 60k lines of code released working full time on one avionics device. We had unit tests but only for the purpose of testing the hardware not really for our own software beyond startup tests.
All of that is the rigour coming out of one principle, every time you find a problem in the code at any point its not the individuals fault its the teams fault and you work out the genuine cause and ensure it can't slip through again. Everything must be testable.
There is quite a lot of parallels with unit testing and indeed it could be a better way to capture tests for aerospace, potentially a slightly less man intensive way as to run all the tests everytime a new release is produced, but it wouldn't look anything like what your typical commercial company does as to them its not worth reducing the 10 bugs per 1000 lines of code they currently have down to more or less 0. Unit testing in aerospace would be about efficient repetition of running known tests much as it is in the commercial space but not driving any form of design process, but I could see test definitions being produced from the function lists. We did a lot of specification testing and limiting of language to allow that to occur and make the code more straight forward.
If I didn't know anything about software engineering, I'd assume it worked something like what you just described.
Compared to what it is in reality, it's a little disappointing, and it's not surprising a lot of people question the title engineer for the majority of developers. I've written semi-critical parts of large businesses and didn't do half of what you described.
I know the trade off between getting things done vs. doing it right, and we wouldn't have our current abundance (in tech) if we took the latter, but the security breaches, the lack of care for performance, the errors, crashes and deaths (more to come as we depend on self-driving cars, IoT, etc.) makes it questionable sometimes.
They are very conservative with writing software in aerospace. One of my first jobs out of school was working as a controls engineer for satellites, where I tested the controls and data handling subsystems. For a new satellite, my department determined either we needed to put a single minus sign in our standard code or the structures group needed to design some exotic assembly for some antennas and sensors. They sent me, the new guy, to the systems engineering meeting to discuss this - because it was a given they would NOT try to update the code when there was another solution, however complicated it was.
The only problem with producing 1000 LOC per year is that your brain will rot from being so narrowly focused on this one thing.
It could be a tolerable job with a programming hobby on the side (FOSS projects or whatever) in which you produce another 15,000 lines to get your "coding fix".
The process described is not narrowly focused. It's full system design, implementation, and validation. Engineering is more than just coding and can be much more satisfying.
I'm sorry, but that's such a narrow view of things. Different people are attracted to different kinds of jobs. What they described is essentially my dream job. I hate the haphazard way modern software is written - it's not engineering at all.
> All of that is the rigour coming out of one principle, every time you find a problem in the code at any point its not the individuals fault its the teams fault and you work out the genuine cause and ensure it can't slip through again.
This. In every organization it is the motivation of its parts that drives it towards the goal (or not). It is easy to misalign common goal and individual goals through bad management. In fact, in my experience this is one of the most common mistakes.
Unit Tests are not the one and only metrics for software quality. Believe it or not, before unit tests were coined, we managed to build reliable software too. Actually a lot of the internet was actually built that way.
But your assumption is very interesting, because it states a very common misconception of quality (whatever that means, reliability, maintainability, resilience, etc):
* "Unit tests imply reliable software". Well, it does not, you'd need reliable unit tests for that, and from what I've seen in the industry, not every unit test out in the wild increases reliability.
* "Without unit tests, you can't have quality software". That is just not true either. First, not all tests are unit tests, end-to-end (integration/functional/whatever) test are extremely useful and raises the quality too. Even without any automated test, the quality of documentation, contract programming, manual testing, etc also help a lot.
Also, note that apart from testing, a lot of different practices contribute to reliability. Separation of concerns, the language level of abstraction, the overall experience and involvements of the developers, and many others.
I find that the people who are the most obsessed about unit testing generally aren't good programmers to begin with.
To me, keeping code simple and easy to reason about is what yields reliable software. If you can't wrap your head around something in the first place then tests are only going to give you the illusion of reliability.
Its as Rich Hickey put it: "What does every bug have in common? They all passed the type checker and the tests!"
> I find that the people who are the most obsessed about unit testing generally aren't good programmers to begin with.
I don't think you can generalize this, neither subjectively (my shipping code quality has gone way up since I switched to TDD) nor empirically (evidence seems to indicate that you can achieve up to a 90% production bug reduction with a 15-35% upfront extra time cost doing TDD, which most would consider "worth it"). Secondly, this doesn't mean a very good programmer on a team doesn't have to write tests, if only to prevent others from breaking the behavior expectations of his code down the line. Thirdly, tests serve to validate that behavior if you yourself ever have to go back to the code to change or refactor it, so unless you are SO good of a programmer that you have perfect memory of the mental models of all your past code (which would put you in the extreme minority of all programmers), they will still come in handy.
Bad programmers are going to make a mess of things no matter how many tests you have. More often than not I see tests as needlessly solidifying the architecture prematurely.
More often than not I see TDD yielding horrible, terrible software architectures and at that point you do need tests to maintain the resulting mess. TDD somehow assumes emergent architectures and I know from experience this is the worst way to architect something.
NPM for example is full of well-tested libraries with poor architecture and bad usability (could be why everything is reinvented every 6 months) but I wouldn't call that a success story.
I'm willing to bet the bug reduction from the extra time upfront doesn't come from doing TDD but actually thinking about the requirements and how they fit the larger picture.
I really like the following story to illustrate this point:
That blog post simply shows that Ron Jeffries failed to understand the core problem of "how to solve Sudoku puzzles" (or lost interest before he succeeded, which explains why he spent so much time on the data representation aka "bike shedding" or "yak shaving") and is no indictment of TDD, which is not a magic elixir that allows you to "organically" solve every possible problem.
> horrible, terrible software architectures
1) Need an example. 2) There is no evidence that the architecture would have not been even MORE horrible AND terrible without the TDD involvement
> needlessly solidifying the architecture prematurely
As all code (whether "test code" or "tested code") is wont to do. Perhaps the fallacy here is considering test code as a separate, "optional" entity in your codebase instead of as an integrated proof of it working as advertised.
> willing to bet the bug reduction ... comes from thinking about the requirements
Still not an argument against TDD. If anything, that's an argument FOR it. The difference being at the end of the day, you have a working proof of the satisfaction of your requirements.
Whiteboarding and TDD are not mutually-exclusive. I don't think that TDD at all implies "not thinking about the problem" nor does using it turn a crappy programmer (who would tend to choose terrible designs over better ones) into a good one. I think you are still constrained by the skill of the programmer no matter whether you use TDD or not. As TDD is just human-produced code, like all code.
NPM could have had lots of fresh out of college monkeys work on it for all I know. It certainly wouldn't surprise me. You're right that TDD won't fix being an inexperienced programmer, but that's not an anti-TDD argument, that's an anti-inexperienced-developer argument.
Lastly, you failed to address the other arguments I made in favor of using TDD (note: there's some conflation here between "unit testing" in general and TDD specifically unfortunately)
Lets just take GUI then, how can you design that using TDD? Lets say you've got a multiple steps form where the user can go back and forth between steps and change data anywhere. How are you going to test for ALL possible flows, which are counting in the millions by now.
Software reliability doesn't come from testing the obvious use-cases and calling it a day. You'd get the same result by just booting the program and having all these functions execute. Bugs are always going to be in the use-cases you didn't think to test, and there's always going to be orders of magnitude more use-cases than tests.
What is incredibly hard to get right are complex sequences of state mutations. This is where most of the bugs are going to be and this is where TDD breaks badly.
I'm not saying unit tests and TDD are bad, I'm saying they're overrated. They're one tiny tool you get to verify the reliability of your software and for a lot of application domains they are completely useless.
More often than not I see pro-TDD people not knowing concepts like property-based testing which actually can model complex user interactions. And once they learn about it their architecture is already crippled by having TDD drive the design of individual components but not the whole thing and now there's no sweet spot to hook generative testing or reason about the program as a whole.
These answers point to dead links and reeks of Agile.
They mention frameworks and MVC variations; from the start they've already bloated whatever they're building and none of them address the real problem: complex sequences of state mutations.
So not only are they having software that's orders of magnitude more complex than it should be, they're also completely avoiding the hard problems.
You won't see a test like "enter a username, dont wait for server validation, enter an invalid email, optimistic updates, fix email, more optimistic updates, server response & state merges, advance to step 2, 3, come back to step 1, change email, advance to step 4, crash.. wat?!"
That's where the bugs are. Not "click the button and the popup opens" because that's trivial. You're also probably not testing that button while a menu is open, or a dialog already opened, or between optimistic updates and server responses, or the combinations of these, or ALL the cases you're going to forget.
You're not going back to test each new feature against every single possible state the app can be in. That's why I say TDD is overrated because the problems it fixes are trivial and it ignores the important ones.
You have state even in functional programs. My original point was about simplifying the program so it is easier to reason about in the first place, rather than write tests for a yet-to-be-overengineered-mess :)
What if you get very tight memory/cpu constraints and FP/immutability isn't available/affordable? What if your programs runs on millions of networked nodes simultaneously?
GUIs are still very stateful even with FP for one! So if you're going to write property-based tests after you know what the properties of the system are, and these will cover way, way more cases than you'll ever think of writing tests for manually, why even bother?
What I do see is people deluding themselves into thinking their software is reliable when it definitely isn't and they're now even less careful when developing because they've got tests to protect them.
I think unit tests are a great tool to motivate code that is simple and easy to reason about. After all, a good unit test is itself simple and easy to reason about. It also forces you to think about what is the simplest unit of software for a given task, helping design the software it is testing.
Also, I've definitely had code I thought was well tested and good to go, but when I rigorously went back and wrote unit tests, I found subtle bugs I wouldn't otherwise have known to fix.
That said, yeah it's not all it takes to get things right. Dogma is bad, but unit tests are an essential part of the toolkit for ensuring code quality.
Yeah, i've found that my heavily unit tested code is often my best code, simply because designing it with that in mind yields the simplest, smallest units. If I know I'm gonna need to write a unit test later, I'm much more inclined to split a long function out into more sensible small ones, and I'm not going to try to combine two different behaviors (because that's hell to test).
After three years of working on some software (basically some fancy shell scripting but in Python), when the project had grown dramatically beyond its original scope through a host of hacks and bolted-on features, I remembered that the software had unit tests and ran them.
Surprisingly, all the unit tests passed, and my first thought when I saw that was "Man, I bet these unit tests are awful."
Could to into more detail about what techniques NASA specifically used to achieve quality software? I'm honestly very interested. I feel like you didn't answer the spirit of the question "How did NASA build highly reliable software" while pigeon holeing on "without unit tests".
Their hardware was essentially fixed, well defined, and limited in scope.
They reviewed every single piece of code many, many times.
They were draconian about memory handling, loops, preprocessor directives, and recursion.
All functions had to check all return types from other functions.
No more than one level of pointer dereferencing.
All code bodies had to fit on a piece of paper.
/edit should add, they used their tools thoroughly: compilers were run on max warning levels and absolutely no warnings were accepted, ever. They also used several static analysis tools on the code (under-appreciated these days!).
None of this crunch time bullshit, though I'm sure it happened.
I've been on one of those binges. If another coder hadn't seen my errors, due to diminishing sleep, I'd probably keep banging my head against the wall. Just compounding the situation.
Anecdotal generalization: I've seen better code from the 40-hour-a-week guys than the ones that do 80 hours of overtime each week.
That doesn't mean the former finish first, they usually don't. But their code tends to be more thoughtfully constructed. They have a vested interest in not spending their time in the office, and that means minimizing debugging time along with coding time.
When you're in a hot field with lots of competition, maybe those sprints make sense. But in the field of NASA-type systems, there may be a deadline, but it's rarely urgent. This gives the developers much more time to step back, breathe, and think.
I can't count how many times I stayed late at the office working on a problem for 6 hours straight, and then after a good night's sleep found and fixed the issue after 20 minutes at work the next day. Sleep is king.
I have latent dyslexia which seems to emerge after too many hours at a computer screen. I've come back from a good nights rest to find variable names that are just slightly misspelled, recursive loops that are unreachable, and functions that are incomplete because I was distracted by a bug hunt.
I think we all do this to some degree. When I'm working on a difficult problem, it feels like Christmas Eve, because I know I'll wake up and have the solution coded up before my coffee is gone.
An article was recently featured here, that said functions were limited to 60 lines. That seems like way too loose of a restriction considering the possible consequences.
One page of text at standard size, with standard style (one statement or declaration per line). Which is roughly 60 lines.
If you follow their other guidelines (no recursion, fixed sized loops, etc.), then it can take you several lines to do much of anything, so 60 isn't very long. Those things combined means you're basically saying that each function should only be attempting to do 2-3 significant things, which seems about right to me, as most people can't fully understand more than 3 things simultaneously at one go.
Much of that is covered in "The Capability Maturity Model: Guidelines for Improving the Software Process". The book is out of print now but available fairly cheap and it's well worth reading.
Unit testing is not a method of achieving reliability, but of accelerating development. They do not prove that a system functions correctly, which is more in the domain of systems and integration testing.
Also, testing is only one small part of building reliable systems. Instead, reliability requires actual engineering practice, which is non-existent in the software world except in a few life-critical domains.
NASA and their contractor's software engineering practices have been covered extensively elsewhere. Here are some links.
My takeaway from reading the fastcompany.com article is that it's not process improvement per se that provides quality software, it's money. NASA has invested a great deal more time, and therefore money, into developing software than a typical commercial software company can afford. We will need quite a lot more programmers necessarily getting paid a lot less if we want commercially viable software production within spitting distance of NASA quality expectations.
That's just it. If a bug in software can cause death you're not going to use agile. You're going to use a waterfall-like process, and you're going to have stringent quality gates at every single transition - from envisioning, through requirements analysis, design, development, test, release, and into operation. It's the only way. I know because I built a military guaranteed messaging system. Tech aspect was small, but it took 24 months.
Then again, agile has some relation to lean (which of course is a production method, not a design method) - and cars probably have a much higher chance of killing people than space craft do.
I think it is more a case of "what you can get away with" in terms of domain knowledge and risk: If there's only some money on the line - taking a hit from time to time from bugs, bad design etc - will often be cheaper than slowing down the design and production process. Especially if a system with (some) bugs can start earning money on day X, while a "perfect" system would require some N > X days more to start earning money.
Waterfall is a perfectly valid process - but for it to work, it puts a very high demand on the domain knowledge of those involved, as well as information management. And in reality I think very, very few large projects get away with no iterations for sub-modules.
I was working with a crew that was refurbishing deep sea drilling platforms - originally built in the 70s. One particular task was moving anchor winch engines, along with ventilation (vent-hole had to be displaced due to moving the engine(s)). On the design the crew got from engineering, the hole was just moved some 10 meters. But the engineer was clearly working from out-dated drawings - moving the hole as indicated would have lead the team to cut through half of a fuse board, some computer controls for some other sub-system and through an interior supporting strut. In the end the solution turned out to be prototyping a longer shaft with pvc pipe, and then welding the thing in place, in pieces. The one-day job turned into a two week job - so the cost of the single mistake on a drawing cost an enormous amount of money (but of course, there were many such mistakes, so who knows how early the rig could've been out of dock anyway...). On the other hand, it got fixed.
I absolutely agree that you need more than "just agile", though. You need (a probably multi-stage) verification process, for example. In the example above, the welding and angle of the pipe was inspected first by the crew leader, and later by external inspectors -- that particular system didn't directly affect life or death, but as it was venting from from propulsion engines (I think) - an error leading to rust or system suspension would be very costly.
In addition to everything else mentioned here they spent a TON of money[1]. For example they spent what would be 3.6 billion 2016 dollars on Guidance and Navigation. If your current project could drop a few hundred million on QC I bet quality would go way up regardless of automated testing or not.
The "few hundred million for quality control" number still stands. In addition they were able to draw on everyone from the people designing the transistors through to the programers to debug problems. Not something many modern projects can do.
Your submission falsely implies that unit tests create reliable software. But that aside...
Avionics software is subjected to multiple rounds of peer review (often by different companies), integration testing, and in some cases multiple different implementations are developed and black box tested against each other for different outputs given the same inputs.
Back in the times when we counted in bytes or kilobytes, programs were simpler to manage. One person could overview all code. Now we have programs with million lines. Nobody can oversee that.
"One person could overview all code." Not sure that's true. Imagine reviewing the code of the Apollo Guidance Computer, just to take an example from the 60's. The stack of paper containing the source code printout is more than five feet tall: http://skeptics.stackexchange.com/questions/31602/is-this-a-...
Reviewing it would be non-trivial, but in terms of absolute code size, it's not much larger than a moderately-popular open-source project today. It's significantly smaller than React, for example:
Unit tests aren't really about testing (in the validating sense) software. They in fact are quite deficient in that role. No integration tests, no end-to-end tests, no fuzz testing, etc.
Unit Tests are foremost a software development tool. They force (in order to be unit testable) separation of concerns, definition of interfaces (i.e. the "contract" with function's users), etc. Mechanically, they also enable validating that internal refactoring has not violated that contract.
As big of a fan as I am about unit tests, I find their value somewhat dubious. They take a long time to write, and when they fail, it often means the unit test needs to be updated. Mocking interactions among objects takes a long time.
What I find has a better ROI is using the same unit testing tools, but writing what unit test purists will call integration tests. Basically, using unit test tools where most of a program is instantiated, but I/O is mocked, is much more useful. When these tests fail, it usually indicates a bug instead of a part of a unit test that needs to be updated.
This is why/how they are a development tool. "Mocking interactions among objects", often breaking, taking too long to write (I expect to spend 30% dev on units, 30% on code rest thinking, planning, etc). Those are all indicators that your code/architecture is bad. Too complex, too coupled, too much concern, not layered into clean interfaces, etc.
A while back someone posted some programming guidelines issued by nasa. It detailed requiring assert statements at least every 10 loc. It spoke of many of other things as well, but that was one that stuck in my head.
This is something I do myself. As I build up the code, I write tons of self checking asserts so it fails quite quickly on the first sign of something wrong. At some point, I'll remove the less likely asserts as they just clutter up the code.
For whatever it's worth, assert statements were also heavily used in embedded digital TV software. The philosophy is that things either work or they don't. With enough QA and field testing, those asserts will show up if state goes awry.
From http://link.springer.com/article/10.1007/BF01845743#page-1
1) Integration testing, 2)Systems testing, 3)load testing and 4) user acceptance testing. This would have been done with simulation tools alongside manual testing. Think about the way hardware testing is done and extrapolate those techniques to software.
The test engineers as well as the implementation engineers also have a lot of experience within their specialized areas. This provides a lot of insight into expected results when running these different types of tests and scenarios.
Also, these tests are long duration. Multiple iterations of simulations can take years to complete.
Cacti had a nice summary. Some of their methods are also detailed in NASA's Software Safety Guidebook that was recently posted. My comment below tells you which pages have good stuff plus, as usual, has a link to the PDF itself.
Here's a recent comment with so-called Correct-by-Construction methods for software development that knock out tons of defects. Usually don't have unit testing. The cost ranged from being lower (eg some Cleanroom projects) due to less debugging to around 50% higher (eg typical Altran/Praxis number). Time-to-market isn't significantly effected with some but is sacrificed for others. So, you don't need several hundred million in QA as some suggested.
- I think they did apply methodologies of reliability that worked in hardware, to software (e.g redundancy, automatic fault detection/tracing/correction)
- Also, in the early days, software was so tied to hardware, and hardware was faaar less complex, so they probably could understand everything that was happening in the system at a given time
I think it's important to remember that the kind of software written for any kind of flight control or guidance system are very very very far from the kind of dynamic software normally done today.
There where more likely to be hardware errors than software errors, thats how close to the hardware they wrote.
Really? Because, how could quality assurance, software testing, and formal verification have existed before some hipster buzzword for a particular approach to it came about?
This is an article about the team writing code for the space shuttle, it is not so much about how they tested but more about the (insane) amount of time and energy spent on making sure everything worked.
Margaret Hamilton, leader of the team that developed the flight software for the agency's Apollo missions, has been granted a NASA Exceptional Space Act Award for her scientific and technical contributions.
"The Apollo flight software Ms. Hamilton and her team developed was truly a pioneering effort," said NASA Administrator Sean O'Keefe. "The concepts she and her team created became the building blocks for modern 'software engineering.' It's an honor to recognize Ms. Hamilton for her extraordinary contributions to NASA," he said.
Dr. Paul Curto, senior technologist for NASA's Inventions and Contributions Board nominated Hamilton for the award. Curto said, "I was surprised to discover she was never formally recognized for her groundbreaking work. Her concepts of asynchronous software, priority scheduling, end-to-end testing, and man-in-the-loop decision capability, such as priority displays, became the foundation for ultra-reliable software design."
One example of the value of Hamilton's software work occurred during the Apollo 11 mission. Approximately three minutes before Eagle's touchdown on the moon, the software over rode a command to switch the flight computer's priority processing to a radar system whose 'on' switch had been manually activated due to a faulty written operations script provided to the crew. The action by the software permitted the mission to safely continue.
The lessons learned from Apollo article and 001 features should show how badass they were. Note recent and future tooling in model-driven tooling aim for what hers already did two or more decades ago. One of first, high-assurance toolchains.
The funny thing is it's not just "NASA way back when". Even now, the most critical code (e.g. implantable heart defibrillators) likely has less "unit test" than advert code that posts junk to your facebook wall.
Waterfall is still the more sensible approach in some industries.
To create really reliable software you either use formal methods (seL4 kernel, CompCert etc.) or you spend a huge amount of money and work really slow. NASA is spending a lot of money and working really slow. In other words, they are not in any way state-of-the-art. Just brute force.
The process in aerospace is vastly slower and that can be signified by the fact that on average a programmer doing aerospace produces just 1000 lines of code a year. There is a clear reason why aerospace engineers produce a lot less code:
- Documentation - You get a pile as tall as your desk for a few thousand lines of code, the software is designed to the degree of knowing precisely what the maximum N can be going into a function and its exact runtime, every function has a fixed maximum and minimum values and a runtime associated with it preallocated before anyone writes code. Then the code is checked against the documentation.
- Bench testing - We had a complete dev/test environment in which all in dev hardware/software could do missions and had all the different teams and their avionics device within a network interacting. You then flew countless "missions" (real cockpit parts but swivel chair + simulator) testing all the different scenarios and what you just did.
- The wider team of engineers working on the avionics software would print out the entire code and all branches and "walk" the entire system from beginning to end with documentation in hand. Every line is validated by someone from a different team and every decision scrutinised.
- System test - multiple phases of. Completely separate teams would test every new release to death rigorously, against the clients specs, against the documentation produced and also with their own independent simulator and cockput setup. Not only that but there was a second team doing the same thing trying to catch the first in missing something.
- Then it spends years being tested in prototype vehicles before finally being signed off as ready.
End to end it took about 10 years to get 60k lines of code released working full time on one avionics device. We had unit tests but only for the purpose of testing the hardware not really for our own software beyond startup tests.
All of that is the rigour coming out of one principle, every time you find a problem in the code at any point its not the individuals fault its the teams fault and you work out the genuine cause and ensure it can't slip through again. Everything must be testable.
There is quite a lot of parallels with unit testing and indeed it could be a better way to capture tests for aerospace, potentially a slightly less man intensive way as to run all the tests everytime a new release is produced, but it wouldn't look anything like what your typical commercial company does as to them its not worth reducing the 10 bugs per 1000 lines of code they currently have down to more or less 0. Unit testing in aerospace would be about efficient repetition of running known tests much as it is in the commercial space but not driving any form of design process, but I could see test definitions being produced from the function lists. We did a lot of specification testing and limiting of language to allow that to occur and make the code more straight forward.