I'm also a current developer. This isn't a slight against developers, but the prevailing mindset. And that mindset is little more than 'move fast and keep rebooting things'
Please, where do you work such that you are afforded adequate time to debug deep OS level issues? That it's acceptable to code to handle the 1-10^12 type error?