Also, if I'm reading the proposed fix in the mono pull request correctly, it doesn't deal with the problem entirely because there's a race condition where the code might start execution on the core with the larger cache line size, and then get context-switched to the core with the smaller cache line size midway through executing its cache-maintenance loop. The chances of things going wrong are much smaller, but they're still there...
(Edit: rereading the blog post, they say they need to figure out the global minimum, but I can't see how their code actually does that, since there's nothing that guarantees that the icache flush code gets run on every cpu before it's needed in anger.)
It should converge to the right value eventually but does seem like there's definitely a chance for it to be wrong one or more times before running on the little cores.
It's fine if core migration happens during the invalidation loop - the core migration itself surely must wipe the non-shared cache levels thoroughly, otherwise nothing would work.
EDIT: Actually if the big and little cores are used together, and not exclusively, then this might still be an issue, yeah.
No, in general Linux migrating processes between cores won't nuke the caches. The hardware's cache coherency protocols between CPUs in the cluster ensures that they are all in sync sufficiently that it's not needed.
I understood that the configurations currently in use usually only power up either the big or little cores at the same time, and that kind of migration has to wipe the caches, right? But that might be inaccurate, and you are of course right in the general case.
The state of the art in Linux scheduler handling of big.LITTLE hardware has moved through several different models, getting steadily better at getting best performance from the hardware (wikipedia has a good brief rundown: https://en.wikipedia.org/wiki/ARM_big.LITTLE). You're thinking about the in-kernel-scheduler approach, but global task scheduling (where you just tell the scheduler about all the cores and let it move processes around to suit) has been the recommended approach for a few years now I think.
Core migration don't need to reach a global synchronization point, just enough so that the 2 cores in question agree with each other. This can be done without requiring global visibility of all operations of the source core.
(Edit: rereading the blog post, they say they need to figure out the global minimum, but I can't see how their code actually does that, since there's nothing that guarantees that the icache flush code gets run on every cpu before it's needed in anger.)