You don't even need general purpose speech recognition to call someone on your mobile, just enough to recognize a trigger sound and the name of the person you want to call. This is how I could use voice recognition on my Nokia N73 to call anyone in my address book.
As I understood it the hard thing isn’t converting speech to text, it is understanding which text should lead to which actions.
Strangely enough in command line interfaces this works perfectly, so maybe we need just a more speech friendly way to call commands?
And why you’d need the cloud is beyond me. A speech assistent that fails once you don’t have a internet connection is not only annoying, in some cases it could become outright dangerous.
With a limited set of commands and fairly strict user training, you reduce this problem to "parsing a limited grammar" which is significantly easier. It's more or less what the current top-tier chatbots are doing. You don't need an outrageous amount of processing power for this.
The reason everything currently runs "in the cloud" is very simple; it binds you to the vendor and prevents anyone from reverse-engineering the software in any sort of usable form. It's essentially DRM gone wild.
I agree. I suspect most people would be a lot happier with a dozen or two commands customized to their personal use cases.
Instead we have companies shuffling data back to their servers attempting (and usually failing in my experience) to handle arbitrary commands mostly for the company's benefit
Google Assistant DID figure out which case should lead to which actions. It knew that it was supposed to call home.
Instead of performing the action, it refused to perform the action and demanded that I use my hands to unlock the phone in precisely the situation where doing so was both illegal and dangerous.
You're right about the spurious cloud connection requirement. There is no way that we need to make a cloud connection for this or many other functions, yet it seems to be the default architecture for almost everyone these days. Just because you can does not mean you should.
It is then refusing to do the action because they decided that I need to unlock my phone, PRECISELY in the situation where I CANNOT use my hands to perform that action.
Also, I last summer had an opportunity/need to use an obsolete DragonDictate from circa 2009. This old software was FAR better at recognition and well-thought-out command flow than any of Microsoft's or Googles current offerings. Yes. it is hard, but doing better than a decade behind is not hard.
So, both your general premise and your specific characterization are wrong
E.g., "OK Google, Call Home".
nothing.
Why? Because it is telling you, usually with a silent on-screen notification that you need to unlock your phone.
I have found that trying several times including swearing seems to unlock it.
But this is still creating a distraction while driving.
Very poorly thought out.