After the bios screen screen would go black as it tries to load what I assume is...

exikyut · on June 28, 2021

Ahhhh, I see what you mean now.

First of all, that tiny little detail about the ACPI error is actually incredibly helpful: it's one of the few messages that tend to still leak onto the screen when the system is configured to boot in quiet mode. Thus, Linux was actually partly booting 100% fine.

If the system was then just automatically resetting after a bit, that definitely sounds like a driver fault, and if you were still on the Ubuntu 18 drivers it sounds completely reasonable (for proprietary values of "reasonable" ._.) that you'd encounter a kernel panic or hardware lockup/reset or something like that.

--

I was curious why that ACPI error message leaked onto the screen, and presumed/guesstimated it was because it was being printed with a high log level/priority. I decided to go digging to see if my theory was correct.

Thanks for the verbatim quote, "namespace lookup" found the source of the message immediately: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5.... So this uses acpi_os_printf() (defined at https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5...), a va_args thunk to acpi_os_vprintf() (defined immediately after), which... does a few things. It's honestly going to be shorter to just

  #ifdef ENABLE_DEBUGGER
    if (acpi_in_debugger) {
      kdb_printf("%s", buffer);
    } else {
      if (printk_get_level(buffer))
        printk("%s", buffer);
      else
        printk(KERN_CONT "%s", buffer);
    }
  #else
    if (acpi_debugger_write_log(buffer) < 0) {
      if (printk_get_level(buffer))
        printk("%s", buffer);
      else
        printk(KERN_CONT "%s", buffer);
    }
  #endif

there we go.

This is weird: it uses different paths if kdb support is compiled in. If it is, it'll only ever use printk() functions, but if it's not, it tries calling acpi_debugger_write_log() first and only does printk() things if that returns < 0.

The printk_get_level() thing, added in 2016 (https://github.com/torvalds/linux/commit/abc4b9a53ea8153e0e0...), checks to see if the last line of text was a continuation line, and only starts a continuation line if the last line wasn't one. (Orthogonally relevant: https://lwn.net/Articles/732420/ coincidentally happened a year later)

I think that acpi_debugger_write_log() (https://github.com/torvalds/linux/blob/master/drivers/acpi/o...) is just a circular buffer sink. It dispatches via a function pointer to acpi_debugger.ops->write_log ("oh no, where does that go"); LXR to the rescue, which cross-references (https://lxr.missinglinkelectronics.com/linux/drivers/acpi/os...) (via the tiny usage link) to https://lxr.missinglinkelectronics.com/linux/drivers/acpi/ac..., which is... just a circular buffer writer (https://github.com/torvalds/linux/blob/master/drivers/acpi/a..., https://github.com/torvalds/linux/blob/master/drivers/acpi/a...). Huh.

If there's something spinning in the background continuously flushing the contents of the ACPI buffer to the screen, I have no idea how I'd surface that. But in terms of this particular call graph, I think the only potentially-interesting area is actually the KERN_CONT mechanism itself. I was fascinated to learn that the message prefix system actually works by writing { 0x01, <character> } into the buffer (https://lxr.missinglinkelectronics.com/linux/include/linux/k...), where continuation lines are marked using "c". Interesting.

Now I'm wondering, if the last message to be printed to the console had a proper level and all, and the next line was a continuation line... what level does it get? I now see that the chances are this is not the reason why it leaks onto the screen, but that was actually my first thought.

I'm still learning/limping/stumbling through understanding all this, so this was just poking around for fun/practice :)

Practically speaking I do generally prefer to have an Absolutely Blank Screen™ while the system is booting, save for what I put on it :), and to that end the nuclear option is to add "fbcon=map:1", which basically reroutes the console to /dev/fb1 (assuming you don't have an fb1, aka a 2nd screen :) ) - but given that this literally gives you no console at all, yeah, not great for everyday usage (and unfortunately not great for many embedded scenarios where RS232 or network access would be trickier than just switching to a console on a ~VGA display). Hence my interest in seeing if it is in fact possible to squirrel away all the text, but still have functioning CTRL+ALT+F1 et al.

--

Also - when you're in the GRUB menu (which you can usually show by spamming ESC nonstop immediately after POST, if it doesn't automatically sit at the menu for a couple of seconds), hit 'e' to edit the selected item, identify in the wall of text the bit of the 'kernel' line that says "quiet" and/or "loglevel=..." and insert "loglevel=9 verbose debug", then hit CTRL+X to boot the modified entry. This can be made permanent by editing /etc/default/grub then DON'T FORGET :) to run `sudo update-grub` afterwards.

Generally you can effectively learn how to play with this in a VM, since once you're in GRUB most things are identical to real hardware. The majority of default configurations have like a 5-30 second timeout as well, so it's conveniently less necessary than it used to be to identify the exact nanosecond to start mashing the keyboard...

ackbar03 · on July 1, 2021

This is pretty amazing how you got all that from my simple hint. I also had a more detailed question regarding this on stack overflow but it seems to have been deleted from lack of response. I also have screenshots if you happen to be interested but by all means don't let me take up your time

exikyut · on July 2, 2021

Feel free to email them over, sure. (Full disclosure, I'm sometimes terrible with reply latency.)

And FWIW, you actually tripped over a small longer-term cluster of internal grumbling - I like my screen to be completely blank at startup, and I've stared unimpressed at AE_NOT_FOUND-like messages (including that exact string) on my own system before I get to a login prompt.

So what kinda started out at "hmm, that's probably a misclassified priority or something..." ended up as a small wall of GitHub torvalds/linux links. Woops. (And then I didn't figure it out anyway... hmph)