With a limited set of commands and fairly strict user training, you reduce this problem to "parsing a limited grammar" which is significantly easier. It's more or less what the current top-tier chatbots are doing. You don't need an outrageous amount of processing power for this.
The reason everything currently runs "in the cloud" is very simple; it binds you to the vendor and prevents anyone from reverse-engineering the software in any sort of usable form. It's essentially DRM gone wild.
The reason everything currently runs "in the cloud" is very simple; it binds you to the vendor and prevents anyone from reverse-engineering the software in any sort of usable form. It's essentially DRM gone wild.