(I don't know where you get off concluding "never" but whatever.)
but you're right, there are very (extremely) serious manufacturing challenges. the fact that people were so good at shrinking die size is what has led to a lack of the third dimension. I guarantee, if they couldn't figure it out they would have to - regardless of mfg challenges. it's the only direction to go. (other than more core numbers, which they also have done.)
But yes, I highly suspect that technology will never reach a point where multi-layer cores make sense. Part of the reasoning is that even if you reduce distances, you're still fighting with transistor switching speed, and individual cores are already very small. I do believe that designs with cores on one slice and L3 cache on another slice may happen, by the way. Another part of the reasoning is that (even if manufacturing can largely avoid the multiple-slice-then-glue method), the density of inter-layer connections will likely be much lower than the wiring density on the lower metal interconnect layers [0], which makes it unattractive to have closely tied logic spread across different layers. There are theoretical academic papers talking about the physical design of e.g. adders spread across a small number of layers, but they're frankly not very impressive, and their assumptions about the electrical properties of the inter-layer connections appear a bit on the optimistic side (understandable because hey, you've got to publish something!).
Overall, it's just a big headache, compared to the alternative of going 3D by stacking relatively large logical components such as cores and cache blocks.
[0] To be fair, I haven't seen numbers on this. I don't know how familiar you are with the typical stacks of interconnect metal, which consist of a very small number of layers for the thinnest wires, and then additional layers of increasingly (order of magnitude) larger wires for longer distance connections. In a hypothetical multi-transistor-layer design, how do you arrange those interconnect layers? To have a high density of connections between the layers, you want to only use layers with thin wires between the transistor layers, but then it's unclear how the longer distance wiring should work.
Edit: Let me formulate my position as a more concrete prediction. There will never be a major commercial microprocessor in whose design an automated tool is used to decide the assignment of a significant fraction[1] of transistors to their layer on an individual basis[2].
[1] Meaning a significant fraction of the random logic transistors; with growing caches, the fraction of transistors which is placed by an automatic tool is decreasing anyway.
[2] Meaning that the tool makes individual decisions about fundamental building blocks such as NAND gates. It is slightly more conceivable that an automatic tool is used to assign larger blocks (such as an FPU) to different layers. I would still predict that this won't happen, but with lower confidence.
but you're right, there are very (extremely) serious manufacturing challenges. the fact that people were so good at shrinking die size is what has led to a lack of the third dimension. I guarantee, if they couldn't figure it out they would have to - regardless of mfg challenges. it's the only direction to go. (other than more core numbers, which they also have done.)