We use Nagios + CheckMK, but I believe neither this nor Zabbix solve the alerting part.
Our solution is a custom notification broker that decides whom to alert and then waits for an acknowledgement. It uses different backends including our company chat.
Not complicated at all, just 100 lines of Python code that contain the business logic.
Anything that relies on a single medium is unsuitable for anything but unimportant alerts. What if Slack goes down for 2 hour? Unlikely, but definitely possible.
This ensures that every alert is explicitly acknowledged by someone, and that unimportant alerts are quickly forgotten without wondering whether someone handled them or not.
We have different applications sending alerts, not just Nagios (because Nagios sucks at processing events as opposed to states), and it would quickly become unmanageable without some sort of middleware.
Is it something where the business logic part could be easily abstracted / separated? It sounds like an interesting and useful yet simple tool. The open source community can always use more of those.
Edit: or maybe something like a blog post to describe the structural details.
Our solution is a custom notification broker that decides whom to alert and then waits for an acknowledgement. It uses different backends including our company chat.
Not complicated at all, just 100 lines of Python code that contain the business logic.
Anything that relies on a single medium is unsuitable for anything but unimportant alerts. What if Slack goes down for 2 hour? Unlikely, but definitely possible.
This ensures that every alert is explicitly acknowledged by someone, and that unimportant alerts are quickly forgotten without wondering whether someone handled them or not.
We have different applications sending alerts, not just Nagios (because Nagios sucks at processing events as opposed to states), and it would quickly become unmanageable without some sort of middleware.