The summary, approximately: there are some crashes detected by Firefox and it seems Firefox (or whatever Firefox uses as its libraries) thinks it can use AVX instructions even if the OS isn't aware of the AVX technology and therefore can't preserve the content of the registers during its "normal work."
And yuhong has found that the reason Firefox thinks that, specifically on Windows XP which is booted with the Windows 10 loader, is that the Windows 10 loader sets some bit "I know about AVX" instead of clearing it before entering in Windows XP.
Set it then forgot to clear it. The "bits" include CR4.OSXSAVE and XCR0 (set by XSETBV). BOOTMGR is used for booting Vista and later, and it chains to NTLDR for booting XP/Server 2003 and older.
Windows has always had the worst dualboot story. It'll gladly overwrite grub etc without asking (to "helpfully" fix boot problems?), but if your partitions are ordered a little out of the ordinary it'll still throw a ridiculously not-helpful error. They might as well just stop pretending to support it.
This took me a while to figure out, but if you use UEFI, and have Windows and Linux on different drives / different EFI system partitions, you can use grub-install --force-extra-removable to keep Linux bootable after Windows helpfully clears out the NVRAM Boot* variables: https://github.com/ludios/ubuntils/blob/master/bin/reinstall...
I legit just physically disconnect the drives with other operating systems while installing Windows in dual boot situations. Can't break what isn't there!
That worked with MBR boot, but with UEFI, Windows will notice you have Boot* variables in motherboard NVRAM pointing to a Linux install that doesn't exist, then clear them for you.
Sometimes very necessary. Windows 7 had a bug that caused it to spread boot information across multiple drives if you had more than one plugged in at install time, even if you specified only to use one.
A cool way to fix this if Microsoft doesn't want to (although I can't see it being that hard) would be to make a small stub that simply kills the AVX flags then chainloads through to NTLDR.
It would probably be quite a simple project to work on, and a fun way to learn about bootloader-level software development. Chainloading NTLDR is well-understood (and will never change), and being booted by BOOTMGR is also fairly well understood too. If I was running Windows at the moment I'd be seriously considering playing with this myself.
Hah. Well, I expect an appropriately patched version of NTLDR will surface pretty soon then - especially considering the signing issues are moot (see elsewhere in this thread).
Ooooh. Good point - and I don't actually know. [EDIT: See comment below, XP doesn't do Secure Boot, the following is moot for this context.]
So... BOOTMGR (Win10) is chaining through to NTLDR to load WinXP. And either Win10 comes with a copy of NTLDR, or pokes around to find the one on the XP system.
By my reasoning, Secure Boot should say "okay" and happily start the machine when it decides BOOTMGR is okay, on the basis that BOOTMGR will verify whatever it loads. The question is whether BOOTMGR actually does that, and seeing as if it doesn't then there isn't really a boot trust chain, well, it probably does verify what it loads.
I fear this is something only Microsoft would be able to fix properly for users who want/need Secure Boot. Slightly ironic. But thinking about it, Secure Boot on XP is kind of like deadbolting your front door when your walls have completely disappeared (picture a door sitting in the middle of nowhere), because XP is officially EOL now.
BTW, the boot debugger in checked builds of NTLDR is harder to use than the one in BOOTMGR. You have to manually load symbols by reading the PE header into memory using a copy with the real mode stub removed, and you have to use F10 at the boot menu in order to break into the debugger.
As someone completely naive about Windows (and particularly low-level development), I'm casually/idly curious if the DDK includes the debugging tools you used, or if I have to throw money at people to get at them. (I know the DDK exists, but not much more than that.)
Also, how did you remove the real-mode stub? Hex editing, or is there a tool that will do that if fed the right combination of [obscure] options?
IMHO, the best way to finish this off would be an in-depth tutorial-style blog post. You definitely get my recommendation to consider that :)
Yes, the DDK used to include the needed debug version of NTLDR too. And yes, it is done by hex editing looking for the "MZ" signature and trimming everything before that. With BOOTMGR you just set some BCD options to enable the boot debugger.
> Many compiler optimizations (such as stack frame elimination) are disabled in the checked build. This makes it easier to understand disassembled machine instructions, and therefore it is easier to trace the cause of problems in system software.
- Firefox is seeing a crash associated with use of the AVX vector-processing (~= high-performance math) instructions
- The AVX instructions use more registers, and registers need to be saved/restored during a context switch, so you can switch back to a program and have it be transparent. So you can only use AVX if the OS supports it, and promises to save/restore those registers in addition to regular registers when it does a context switch. The OS reports to the CPU "Yes, it's okay to let people use AVX" by setting a bit in a control register. Applications check that bit before using AVX instructions.
- The crash is an illegal-operation exception, which should be impossible because Firefox checks to see if that bit is set before using those instructions.
The answer to the mystery: Some people are using an old version of Windows, that does not support AVX, but with a bootloader from a new version of Windows. For whatever reason, the bootloader sets the "Yeah, AVX is fine" bit, and expects the new version of Windows to detect AVX and set support as appropriate. Old versions of Windows don't know about that bit, though, and never clear it. So Firefox proceeds to use AVX on a CPU that has no AVX support.
This was discovered by someone mentioning that they were dual-booting Windows versions, and that the crash went away when restoring the older bootloader.
"So Firefox proceeds to use AVX on a CPU that has no AVX support". To make it clear; it uses AVX on a CPU that does support it (otherwise you'd run into an illegal instruction error), but the OS doesn't. Firefox doesn't know this, however, because it thinks the OS set the bit that says it does.
It seems a bit odd that they'd be enabling AVX in the bootloader. Any idea why they do that? Seems like something for the kernel to do during initialization (like most other features), not the bootloader.
My understanding is that BOOTMGR goes to WINLOAD when booting Vista or later. The routine that ensures the correct CR4/XCR0 values in this case is called bootmgr!ArchRestoreProcessorFeatures and it is called just before bootmgr!BlBdStop in bootmgr!ArchExecuteTransition.
I'm sure it would be helpful for those who might attempt to reiterate the submission if you described as much as you think you understand to provide some context.
Lots of bootloaders (I would say most I have used) allow chainloading another bootloader. That is what is going on here. Pointing bootmgr at another partition and saying "boot that".
Based on what the problem description here is (failing to clear some bits in a register), patching it yourself doesn't seem too difficult. Probably less than two dozen bytes to change.
And yuhong has found that the reason Firefox thinks that, specifically on Windows XP which is booted with the Windows 10 loader, is that the Windows 10 loader sets some bit "I know about AVX" instead of clearing it before entering in Windows XP.