Is that 20x cost... actually bad though? (I mean, I know Datadog is bad. I used to use it and I hated its cost structure.)
But maybe it's worth it. or at least, the good ones would be worth it. I can imagine great metadata (and platforms to query and explore it) saves more engineering time than it costs in server time. So to me this ratio isn't that material, even though it looks a little weird.
The trouble is that the o11y costs in developer time too. I've seen both traps:
Trap 1: "We MUST have PERFECT information about EVERY request and how it was serviced, in REALTIME!"
This is bad because it ends up being hella expensive, both in engineering time and in actual server (or vendor) bills. Yes, this is what we'd want if cost were no object, but it sometimes actually is an object, even for very important or profitable systems.
Trap 2: "We can give customer support our pager number so they can call us if somebody complains."
This is bad because you're letting your users suffer errors that you could have easily caught and fixed for relatively cheap.
There is diminishing returns with this stuff, and a lot of the calculus depends on the nature of your application, your relationship with consumers of it, your business model, and a million other factors.
Family in pharma had a good counter-question to rationally scope this:
"What are we going to do with this, if we store it?"
A surprising amount of the time, no one has a plausible answer to that.
Sure, sometimes you throw away something that would have been useful, but that posture also saves you from storing 10x things that should never have been stored, because they never would have been used.
And for the things you wish you'd stored... you can re-enable that after you start looking closely at a specific subsystem.
I agree that this is the way, but the problem with this math is that you can't, like, prove that that one thing in ten that you could have saved but didn't wouldn't have been 100x as valuable as the 9 that you didn't end up needing. So what if you saved $1000/yr in storage if you also had to throw out a million dollar feature that you didn't have the data for? There is no way to go about calculating this stuff, so ultimately you have to go by feel, and if the people writing the checks have a different feel, they will get their way.
But maybe it's worth it. or at least, the good ones would be worth it. I can imagine great metadata (and platforms to query and explore it) saves more engineering time than it costs in server time. So to me this ratio isn't that material, even though it looks a little weird.