I'm not aware of any CPU invented since the late eighties that doesn't have paged virtual memory. Am I misunderstanding what you mean? Can you expand on where you are getting the 4x number from?
That's an interesting question about the number of levels of address translation. Does anyone have numbers for that, and how much latency and energy an extra layer costs?