What is the problem with binary logging? Would you prefer the ordinary text file...

withinboredom · on Sept 7, 2021

One problem I can think of with binary logging would be catastrophe. I have had to pull HDs, go through the logs, line by line and piece together what happened in those final moments.

Text logs are easy to get at, I don’t need a special program to read, I just need something to display a raw byte on the screen.

Also, I wonder what the actual performance benefits are. I remember back when most protocols were text and/or binary. Usually binary was slower when messing about with stringy stuff. For example, using memcached to cache strings. Why encode/decode strings in a binary protocol when you can just use strings all the way down? I don’t know if that premise is still true, but for us older people, it makes sense to keep inherently stringy stuff stringy to avoid overhead.

lavabiopsy · on Sept 7, 2021

>Text logs are easy to get at, I don’t need a special program to read

I've seen this type of comment a lot on reddit and I don't understand why people say this. You do not really need a special program to read the systemd logs either, in the event that journalctl doesn't work you can easily use `strings` to read the logs. Feel free to try it right now just for fun, the binary format of journald is actually extremely simple and well-documented and just stores the logs sequentially, it's not "encoding" the strings at all. Why would it need to? It's all fulltext data.

kaba0 · on Sept 7, 2021

Re the failure part: you can just plug in any USB stick with a normal linux distro and it will read it out just fine.

Re performance: Strings are most defintely worse than some fixed size format. It’s not an accident that FPGAs were used to serialize/deserialize XMLs, why Google made a whole new binary protocol, etc. Just think about it, you have to parse a string character by character (yeah some clever vector magic can make some string processing go a few chars at a time), while a binary format with fixed layout will be practically random access. You can query the nth entry, etc.

withinboredom · on Sept 7, 2021

You also have to parse a binary protocol byte by byte, or at least the bytes you care about. But the question is, why are we parsing logs at all?

lavabiopsy · on Sept 7, 2021

Journald doesn't actually parse the log strings. Any parseable entities are stored in separate fields (see `man systemd.journal-fields`)

olau · on Sept 7, 2021

I sort of agree, but I think in the end, your recovery thing will have to include the decoder and that's it.

As I recall, there were two reasons for the binary format - a signing scheme to make it tamper proof and more metadata on each log entry.

nine_k · on Sept 7, 2021

Suppose I don't mind binary logging.

I'd prefer it to be a separate independent component, the way syslogd is. (Yes, I know that in systemd the binary logging can be switched off, or even replaced with a different implementation.)

I care mostly about having an independent implementation with a different internal architecture, narrower scope, and a separate governance.

kaba0 · on Sept 7, 2021

The reason it is not “independent” is that it logs even during boot.

Something which simply was not solved with the previous generation of init systems and is greatly important. During booting the file system may not even be mounted, it could hardly be done by a separate process.

lavabiopsy · on Sept 7, 2021

>I'd prefer it to be a separate independent component [...] with a different internal architecture [...] and a separate governance

Not sure I understand this, do you mean you would use something that had the exact same feature set, if the internal architecture was different and if someone else was maintaining it? What would be the point of that? Are there some performance or security improvements to the architecture that were suggested upstream that they aren't doing?

>narrower scope

Not sure I understand this either, the scope of journald is actually quite small, there are many syslog implementations that are bigger and have a lot more features.

nine_k · on Sept 7, 2021

Not necessarily exact same, and maybe not necessarily having the same problems (such as corruption).

There is a reason that certain things have multiple implementations compatible at the level of key interfaces but different inside. Examples: cron and anacron, more, less, and bat, different NTP implementations, different syslogd implementations, to say nothing of the variety of DNS, SMTP, and HTTP servers.

lavabiopsy · on Sept 7, 2021

I'm not sure what you mean corruption, journald logs should actually fare better in terms of dealing with corruption and tampering versus text logs, if you use the log sealing feature. You could also already easily plug any of the syslog implementations in.

smaudet · on Sept 11, 2021

Perhaps he is talking about modularity - systemd is a bit all or nothing, it is a pluggable system yes but (AFAIK) it is not modular so you can't write arbitrary software and have it adhere to some standard, instead you have to use the systemd libraries and link against systemd calls.

So it is another layer on top of the kernel in that sense.

The old alt was to put the shell in charge and bash or sh talked to a bunch of random programs. Which is in some ways more modular but also a bit of a nightmare...

I'm not certain anything like that exists, you would have to define some data architecture almost, maybe involve hardware like ACPI and UEFI...

But that goes back to nobody basing their career over how a computer boots...overly complex. Maybe better just to improve systemd instead.

rnhmjoj · on Sept 7, 2021

They corrupt easily and there's no repair method built into journalctl (or anywhere else, really). The format also does not have any kind of indexing or cache and this is the reason `systemctl status <service>` can take 15s to show the last 10 log lines if you have a few GBs worth of logs.

kaba0 · on Sept 7, 2021

Well, if you are asking for every log of a given service no matter when was it made, it sure will take time - and I don’t see how indexing or cache would help it (create one specific for each service? That would in practice multiple the size of the logs)

Specify that you only want to last day or so (which is quite likely what you actually want), but I do agree that the default could be something sane like only the last week.

yellowapple · on Sept 7, 2021

> What is the problem with binary logging?

For me, it's less that it's binary and more that it's some bespoke format specific to journald instead of, say, an SQLite database or something else similarly widely-used and well-known.