This is on a model designed to run faster on CPUs. It's like dropping a bowling ...

lukas · on Jan 14, 2021

I don't think MobileNetV2 is designed to train on GPUs - according to this https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-depl... MobileNetV2 gets bigger gains from GPUs vs several CPUs than ResNet. You could argue the batch size doesn't fully use the V100 but these comparisons are tricky and this looks like fairly normal training to me.

It's pretty surprising to me that an M1 performs anywhere near a V100 on model training and I guess the most striking thing is the energy efficiency of the M1.

tbalsam · on Jan 14, 2021

MV2 is memory-limited, the depthwise + groups + 1x1 convs has a long launch time on GPU. Shattered kernels are fine for CPU, but not for GPU.

Though per your note on the scales, that's really interesting empirical results. I'll have to look into that, thanks for passing that along.