After the bios screen screen would go black as it tries to load what I assume is the os.
Then after a really long time an error message shows up, something like
Acpi error, namespace lookup failure, Ae_not_found
Then it goes back to bios and tries again.
What I suspect happened is removing the nvidia driver led to some sort of circular dependency or lock on the system. This was when Ubuntu 20 first came out and official Ubuntu 20 cuda and nvidia drives weren't out yet so I was using ones for Ubuntu 18. Never figured it out...
First of all, that tiny little detail about the ACPI error is actually incredibly helpful: it's one of the few messages that tend to still leak onto the screen when the system is configured to boot in quiet mode. Thus, Linux was actually partly booting 100% fine.
If the system was then just automatically resetting after a bit, that definitely sounds like a driver fault, and if you were still on the Ubuntu 18 drivers it sounds completely reasonable (for proprietary values of "reasonable" ._.) that you'd encounter a kernel panic or hardware lockup/reset or something like that.
--
I was curious why that ACPI error message leaked onto the screen, and presumed/guesstimated it was because it was being printed with a high log level/priority. I decided to go digging to see if my theory was correct.
#ifdef ENABLE_DEBUGGER
if (acpi_in_debugger) {
kdb_printf("%s", buffer);
} else {
if (printk_get_level(buffer))
printk("%s", buffer);
else
printk(KERN_CONT "%s", buffer);
}
#else
if (acpi_debugger_write_log(buffer) < 0) {
if (printk_get_level(buffer))
printk("%s", buffer);
else
printk(KERN_CONT "%s", buffer);
}
#endif
there we go.
This is weird: it uses different paths if kdb support is compiled in. If it is, it'll only ever use printk() functions, but if it's not, it tries calling acpi_debugger_write_log() first and only does printk() things if that returns < 0.
If there's something spinning in the background continuously flushing the contents of the ACPI buffer to the screen, I have no idea how I'd surface that. But in terms of this particular call graph, I think the only potentially-interesting area is actually the KERN_CONT mechanism itself. I was fascinated to learn that the message prefix system actually works by writing { 0x01, <character> } into the buffer (https://lxr.missinglinkelectronics.com/linux/include/linux/k...), where continuation lines are marked using "c". Interesting.
Now I'm wondering, if the last message to be printed to the console had a proper level and all, and the next line was a continuation line... what level does it get? I now see that the chances are this is not the reason why it leaks onto the screen, but that was actually my first thought.
I'm still learning/limping/stumbling through understanding all this, so this was just poking around for fun/practice :)
Practically speaking I do generally prefer to have an Absolutely Blank Screen™ while the system is booting, save for what I put on it :), and to that end the nuclear option is to add "fbcon=map:1", which basically reroutes the console to /dev/fb1 (assuming you don't have an fb1, aka a 2nd screen :) ) - but given that this literally gives you no console at all, yeah, not great for everyday usage (and unfortunately not great for many embedded scenarios where RS232 or network access would be trickier than just switching to a console on a ~VGA display). Hence my interest in seeing if it is in fact possible to squirrel away all the text, but still have functioning CTRL+ALT+F1 et al.
--
Also - when you're in the GRUB menu (which you can usually show by spamming ESC nonstop immediately after POST, if it doesn't automatically sit at the menu for a couple of seconds), hit 'e' to edit the selected item, identify in the wall of text the bit of the 'kernel' line that says "quiet" and/or "loglevel=..." and insert "loglevel=9 verbose debug", then hit CTRL+X to boot the modified entry. This can be made permanent by editing /etc/default/grub then DON'T FORGET :) to run `sudo update-grub` afterwards.
Generally you can effectively learn how to play with this in a VM, since once you're in GRUB most things are identical to real hardware. The majority of default configurations have like a 5-30 second timeout as well, so it's conveniently less necessary than it used to be to identify the exact nanosecond to start mashing the keyboard...
This is pretty amazing how you got all that from my simple hint. I also had a more detailed question regarding this on stack overflow but it seems to have been deleted from lack of response. I also have screenshots if you happen to be interested but by all means don't let me take up your time
Feel free to email them over, sure. (Full disclosure, I'm sometimes terrible with reply latency.)
And FWIW, you actually tripped over a small longer-term cluster of internal grumbling - I like my screen to be completely blank at startup, and I've stared unimpressed at AE_NOT_FOUND-like messages (including that exact string) on my own system before I get to a login prompt.
So what kinda started out at "hmm, that's probably a misclassified priority or something..." ended up as a small wall of GitHub torvalds/linux links. Woops. (And then I didn't figure it out anyway... hmph)
Then after a really long time an error message shows up, something like Acpi error, namespace lookup failure, Ae_not_found
Then it goes back to bios and tries again.
What I suspect happened is removing the nvidia driver led to some sort of circular dependency or lock on the system. This was when Ubuntu 20 first came out and official Ubuntu 20 cuda and nvidia drives weren't out yet so I was using ones for Ubuntu 18. Never figured it out...