Yeah I have wondered about this. But seeing how an LLM hammers my M2 MBA CPU for many seconds per request, I’m guessing this would have a significant impact on a smartphone battery.
ANE power usage for always-on Hey Siri wake word detection is impressively low. The language models we have are orders of magnitude way too big, but they won’t be for long. I think we’ll be surprised.
For comparison, a quantized mobilenetv2 takes 3.7MB disk space to solve the same image understanding task as the old VGG Caffe models which take 550MB. We’ve come a long way in six years.
I wonder what the numbers actually are for local compute on custom hardware compared to firing up the wifi antennae to make the remote request.