The tech that powers this is astounding. But am I the only one that really doubts this voice-powered future everyone seems to be aiming for? From Siri to the Xbox One and Google Glass, there seems to be an overall assumption that voice is the interface everyone will be using- but didn't we assume the same when dictation software first came out years ago? That we'd all be dictating our documents rather than typing them?
The only place voice control feels natural to me is when I'm in a room by myself. So that rules out the office, public transportation, and my home, except at rare moments. I'm actually quite happy about that- the last thing I want is to be at work surrounded by people talking to their computers.
Voice search is a natural extension (and time will tell if its an ideal one), but I think the thing that is most interesting with all of this is the steps these companies are making in natural language processing.
I think that the next step after the subvocalization detection will be brain-computer interfaces. Actually they might skip directly to that because it might be easier in some way if they are actually in your brain. Or maybe subvocalization detection using a BCI.
As I understand it, "subvocalization" usually refers to the voice in your head that you experience when you read, so the parent post may have been actually making this claim, albeit inadvertently.
In lot of ways I agree with you. When I got my first android phone I was so excited to use voice search and I was planning on never typing on my phone again. And a few years later I rarely ever use voice search. Even in the privacy of my own home, I never use it. Voice search works well and its fast but I feel much more comfortable typing it, I guess.
I used to feel the exact same way about using any sort of voice coms. If anyone else was in the room with me it felt incredibly awkward. The pinnacle was when I decided to play a video game with my then girlfriend in the room and use Ventrilo. She tried to respond to everything I said, and I went as silent as possible.
Fast forward sometime and using voice com software on my computer feels as natural and second nature as using a phone to me. Skype, Vent/Mumble, Web Demos, etc all feel normal.
I imagine that those that stick with using it will find it pushes their comfort zone less. I would also make the prediction that as voice becomes more readily available that the rising generation will just get used to using it and won't really give it a second thought.
Mind reading would be even better. Until then, writing words with a keyword is in many cases still the best way to provide input, without feeling awkward or without bothering others.
And speech isn't weird in the same way that touch-screens felt weird for some of us at first. Because touching the screen of your gadget does not annoy other people.
Also, did anybody notice how awful speech recognition is for non-Americans? And they just got in the habit of internationalizing things properly, now we're going back to square one, except for the fact that learning to pronounce things with an American accent is much harder than writing words in English, so devices will annoy the heck out of us non-Americans, even if we know English.
I think this implementation is just a necessary first step. Imagine combining this with technology that can detect subvocalizations -- that is the future.
No one is asking this, but why is this chrome only (Disclaimer: I am a Firefox user)? is there a specific part that is tied to the native browser code (microphone and speakers can be accessed though web languages)? [Note] the tone of the question is curious and not accusatory
There was a session at I/O called "More Awesome Web" which seemed to imply that a web audio API (which enables live microphone input) and a web speech API (which enables voice-recognition driven apps) are both currently only supported in Chrome. Here is a slide from that presentation: http://www.moreawesomeweb.com/#32 (just use the right arrow to progress through the slides).
If that's what's holding up speech recognition for other browsers then hopefully these APIs can become more widely supported, but at the same time I don't think the demand for browser speech recognition is there yet. It makes sense on your phone, but bending over your desktop or laptop in order to talk to your computer somehow doesn't seem right yet.
Well, yes and no. It would be possible through Firefox, but the feature (through WebRTC) is very recent, and low-level -- you'd have to go through a prompt, then they'd probably have to connect with sockets, do a request, and come back with the final results.
Conversely, Chrome has had x-webkit-speech -- a high-level text-to-speech feature -- since 2011. The code's already all there, everything just needed to be linked up and presented a little better.
They'll probably come out with Firefox support in a while -- it just takes more work.
Are you logged into your Google account on Chrome? I know that there's voice personalization on the phone versions. I'd expect them to bring that to the browser, too.
Remarkable. Chrome asked my permission to use my laptop's microphone and was then off to the races. I followed the examples in the post but added, "What is his dog's name?" The first result was the Wikipedia entry for Bo.
CastleOS has demos that appear to be successfully demonstrating that the Kinect makes a fairly cost effective voice receiver that can cover an entire room.
I'm not exactly an expert on search machine technology, so I'd be grateful if someone was so kind to help me out here:
I did a "conversational search" asking "what's an amazing voice" and Google returned a definition with a Youtube video of a woman singing. One of the comments simply says: "Amazing voice!"
Why is this listed as a definition? What's happening here?
The conversational aspect was one of the differences between Google Voice Search and Siri (though it doesn't matter if you have conversation abilities if you fundamentally don't work, but that's another debate). Interesting to see the progress in this space.
To me, this would feel more natural in something other than the search bar of Google. When we make a new search, the thought process is that it is a separate and new search.
Although we all know that searches are by no means stateless, they were separate, unrelated questions in my mind previous to this. This makes searching more like a conversation (hence the name) and means that the UI to me does not fit this style. There is no record of what you searched before this.
Imagine if your chat client was displayed in the same context. Google is only including it in this way so that it gets the huge usage of its search engine, but to me a more chatty UI would make this more obvious.
I can do voice searches, but the conversational aspect (asking "how old is he?") isn't working, and I can't seem to get hotwording to either. Is this still rolling out?
> Why is Google so afraid of making a standalone application to run on desktops and not in a browser?
There is a big difference between "doesn't see the value in the added cost" and "is afraid of".
> Then you wouldn't need your browser open to use "OK Google"
Chrome supports running in the background on startup (and prefers to do so), so conceptually there is no reason anything that can be done in Chrome that doesn't require on-screen UI couldn't be done without needing any application "open" (at least, without the user taking the step of opening an application.)
Seems to me that, both from a strategic perspective and a UX perspective, that's a more natural progression for conversational search than a separate standalone desktop app.
The only place voice control feels natural to me is when I'm in a room by myself. So that rules out the office, public transportation, and my home, except at rare moments. I'm actually quite happy about that- the last thing I want is to be at work surrounded by people talking to their computers.