Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are some things in software development that are obviously wrong to most people.

There are some things in software development that are obviously wrong to a few people.

And there are some things people have a hunch we are doing wrong but nobody can crystallize it.

Removing redundancy in code is great most of the time, but it's not a panacea. NASA had to contend with physical failures of memory, and catastrophic costs of failures in 'production'. They solved this problem by consensus pools of three, on physically separate hardware and in some cases using multiple manufacturers. Inability to reach consensus would invoke failsafes.

I have a vague suspicion about how we condense the very most critical bits of our software down to the fewest bits of data and instructions. This may ultimately be a policy we reject. One bad bit and you can end up taking the opposite action of the one you should have performed.



One thing I've often wondered about NASA's triple redundancy: what system calculates or determines the consensus? Is it also a programmable computer, just smaller?


Usually the driven element itself. From [0]:

"One reason why the redundancy management software was able to be kept to a minimum is that NASA decided to move voting to the actuators, rather than to do it before commands are sent on buses. Each actuator is quadruple redundant. If a single computer fails, it continues to send commands to an actuator until the crew takes it out of the redundant set. Since the Shuttle's other three computers are sending apparently correct commands to their actuators, the failed computer's commands are physically out-voted79. Theoretically, the only serious possibility is that three computers would fail simultaneously, thus negating the effects of the voting. If that occurs, and if the proper warnings are given, the crew can then engage the backup system simply by pressing a button located on each of the forward rotational hand controllers."

[0]: https://history.nasa.gov/computers/Ch4-4.html#:~:text=Its%20....


Just as a follow up spacex has written a bit about their systems, which follow the same "actuator is the judge" approach: https://space.stackexchange.com/a/9446


> One bad bit and you can end up taking the opposite action of the one you should have performed.

And bit flips do happen

https://dropbox.tech/infrastructure/-broccoli--syncing-faste...

https://lobste.rs/s/310xjb/broccoli_syncing_faster_by_syncin...


Even more so in space where there's no shielding from cosmic rays whatsoever.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: