There are still going to be time and effort costs involved in scaling that infra...

marginalia_nu · on Jan 9, 2023

You have to output a lot of logs before you fill up even a single large consumer-grade hard drive, especially given logs are typically compressed when rotated.

It's usually only when you involve ELK or something like that your logs start to get big. Which in turn is typically necessitated by over-complicated distributed software design.

If you're at the scale where this actually matters and you're serving millions or requests per second from a worldwide user-base, then affording storage for the logs really shouldn't be a problem anymore (idk, with the possible exception of Twitter)

KronisLV · on Jan 9, 2023

> You have to output a lot of logs before you fill up even a single large consumer-grade hard drive, especially given logs are typically compressed when rotated.

This is a good point - a RAID array of a few HDDs/SSDs scales surprisingly far and is cheaper than many of the cloud services out there, though whether you can or can't use either approach probably relies on compliance requirements and such.

I will definitely add that logs can compress really well - to the point where it's been close to a year since I added Logrotate to a project that didn't have it before, for a pretty basic setup, and I haven't had the need to even look at how many archives are currently retained, given that the disk usage has changed very slightly. And that's for multiple systems that filled up the available storage in months previously.

Of course, my personal gripe is that most of the logging solutions out there are rather complex - something like Graylog feels like one of the simpler self-hostable options while still being fully featured, but in my experience anything that runs ES is really resource hungry. Sometimes it feels like MariaDB/PostgreSQL would be good enough for most of the simpler low logging volume setups out there - if you don't want to manage logs as files, want to ship them somewhere, but don't want the receiving system to be too complex either.

adra · on Jan 9, 2023

Except you know when you.actually want to do something valuable with all those logs. You _should_ be creating logs (signals) to be valuable in some way (diagnostics, alerting, canaries statistics), etc. If you're just dumping logs into opaque blobs that are never looked at them sure write them to blobs to your heart's content and have fun hunting and pecking for reasons you're users are already screaming at you. That strategy is fine, but the limitations are clear. It's reactive.

marginalia_nu · on Jan 10, 2023

Depends entirely on what and why you are logging.

Is it audit logs for security or due to some regulatory requirement? Then huge blobs are fine. Desirable, even.

Transaction logs for machine-loading so you're able to replay an application's state at any given moment in time? Yeah probably gonna end up with huge blobs again.

jodrellblank · on Jan 9, 2023

And what are you going to do when you need a human to read sixteen trillion bytes of compressed logs streamed off a single SATA disk?

Once you face the fact that the performance of a single SATA disk means you can't search the logs in any quick time, and nobody can possibly read that much log data, so nobody will use it, you start to see it as a hoarding disorder not a useful tool.

marginalia_nu · on Jan 9, 2023

It's not unheard of to need to retain years even decades worth of logs due to regulatory compliance. Nobody is reading them, they just need to exist. In that scenario you'll probably keep the current year or so fresh on a mechanical drive and past years on tape.