> In particular, I'd love to know if theres anything major that generic RDBMS's could do better here.
Well, everybody with experience outsources monitoring now since it's a non-core cost center, unless there's a compelling scale or secrecy issue.
If RAM and CPU were free, I'd use MySQL or Postgres w/partitions because of their mgmt. features, tested replication and SQL.
But Prometheus or Clickhouse are 10-25x more efficient in terms of space, and often have much faster queries. The tradeoffs are bizarre HA gaps, lack of trained people, and ops groups are stuck supporting it.
I would never recommend monitoring with anything based on HDFS (OpenTSDB), written in Java (Cassandra), or in-memory for large clusters (InfluxDB.)
For monitoring under 200 nodes, anything will work.
If you only have a day to do something, just install Nagios and you'll get 99% of what you really need.
> Well, everybody with experience outsources monitoring now
That has not been my experience. Quite the opposite every place I’ve been that outsourced monitoring ended up bringing back significant portions of their observability stack either for needing more control, different feature sets or because the outsourced solution was cost prohibitive.
I think it is true that lots of teams continue to outsource storage of metrics data but the outsourced vendors are not incentivized to make it easy to do retention/filtering/aggregation well.
Source: Have run observability stacks across a variety of domains.
> Well, everybody with experience outsources monitoring now since it's a non-core cost center, unless there's a compelling scale or secrecy issue.
In this scenario I find it odd that there are so many opensource projects with traction and success (one name above all, Prometheus) if they are addressed only to specialized companies. What you say applies to small companies or very big ones with lot of money to spend. Mid-sized companies in my experience prefer to spend money on in-house solution because at their volumes outsourcing is really costly. But that's just my small experience.
Monitoring how your platform behaves/perform is a cost-center? I think it's a quite important feature in every tech company and depending on the size of the company it may be better to outsource it or do it internally.
Well, everybody with experience outsources monitoring now since it's a non-core cost center, unless there's a compelling scale or secrecy issue.
If RAM and CPU were free, I'd use MySQL or Postgres w/partitions because of their mgmt. features, tested replication and SQL.
But Prometheus or Clickhouse are 10-25x more efficient in terms of space, and often have much faster queries. The tradeoffs are bizarre HA gaps, lack of trained people, and ops groups are stuck supporting it.
I would never recommend monitoring with anything based on HDFS (OpenTSDB), written in Java (Cassandra), or in-memory for large clusters (InfluxDB.)
For monitoring under 200 nodes, anything will work.
If you only have a day to do something, just install Nagios and you'll get 99% of what you really need.
Source: DBA.