Yes, shared memory is a pretty big leg up since it lets the GPU process the whol...

Yes, shared memory is a pretty big leg up since it lets the GPU process the whole model even if the bandwidth is slower which still has some benefits.

Apple's M chips, AMD's Strix Point/Halo chips, Intel's Arc iGPUs, Nvidia's Jetsons. The main issue with all of these though is the lack of raw compute to complement the ability to load insanely large models.