Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've always thought that the ideal is somewhere in between the two.

1. Catch the panic/exception.

2. Track the rate of these panics or exceptions. If it is too high some data structure has probably been corrupted or some lock has been poisoned. If a lot of requests are failing abort.

And ideally: 3 signal that you are in a degraded state so that some external process can gracefully drain your traffic and restart you. Although very few people have this level of self-healing infrastructure set up.



I wrote a web server that handled a lot of requests, and my solution to 2. was to have it notify me via our alert chat channel every time it panicked. This was rare enough that I could investigate them individually, and if it got overwhelming (it never did) I could choose to do something fancier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: