Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Echo Dot (amazon.com)
351 points by endtwist on March 3, 2016 | hide | past | favorite | 400 comments



The transition from primarily visual UX towards an auditorial UX is really powerful.

Looking at screens to get key information distracts me from my surroundings and seems archaic.

My wife is a sound designer who has opened my eyes to the importance of sounds both in film and in the world. It's not that I was unaware of sounds, but I didn't realize how important they are to centering me in this world and the made up worlds of films and games. Try watching a scary movie with the sound turned off, it turns into a comedy.

I think its unexplored territory that has huge potential to impact the way we interact with the real world, even more so then Glass or Hololens.

When I listen to music as I walk down the street I change, my mood, my posture and the way I look at the world. The music augments the reality around me in a way that visual UX never can because it's a lens between my eyes and the world.


The problem is that voice interfaces break down pretty quickly once you try to do anything complicated. The Echo has pretty solid voice recognition--far better than anything else I've ever used--but it's still hard to get it to do anything useful once you get beyond a pretty narrow script. (e.g. what's the weather forecast, play this artist, etc.)


I've found that the voice recognition on Android phones works well enough to be useful in a wide variety of circumstances. Navigating, getting directions, setting alarms, taking notes, sending text messages, sending emails, searching for things, and many more. When I was still using my Moto X I did the majority of every-day tasks with voice recognition.

The iPhone is catching up fast too...my wife's taken to sending emails via Siri (to avoid strain on her hands), and most of the time it gets things perfectly.

The biggest problem is privacy. One of the nice things about touchscreens is that you have a personal dialog with the device that can't be overheard by anyone nearby. That doesn't apply to voice recognition systems, and it can be pretty awkward to dictate an e-mail to a phone in a crowded place.


Being overheard isn't the only privacy concern. Most of these solutions offload the speech recognition and language parsing functions to corporate servers. I like texting with Siri but I'm not exactly keen on having Apple record everything. It also seems limiting in that I can't use voice commands without a network.

It would be nice for voice recognition platforms to start being built in. I know there's training data that's needed, but there's some convenience afforded.


I think the processing requirements for handling on-device Siri would destroy battery life.


This actually doesn't seem to be the case. Take a look at Google Translate's offline voice recognition AND translation - it's really amazing, considering it's all happening on your device.


I forget where it was, but they published something about training a very small very fast neural network that could fit comfortably in the phone's memory. Tricky tricky. :D


Plus the only way to train these things at scale is to upload the recordings once you have some usage.


Worse for battery life than firing up the radio?


And devices that listen to you 100% of the time is yet another privacy concern... even if they don't send everything to a remote server.


If you have a human assistant who does that job, he also listens 100% of the time.


But he or she is less vulnerable to being automatically hacked by a three letter agency, foreign government, and/or hacker gathering data for identity theft.

The privacy concern _isn't_ necessarily about having something to hide. It's about the consistent hacking of major systems, and exposure of personal data.


And you don't think there are privacy concerns with that? It is a /very/ intimate relationship, and generally requires some ritualized/formalized interaction, and a very high degree of trust.


Just on the note of hand strain, without knowing anything about your wife's condition, a way that could help alleviate it is to critically analyse hand position/technique. As a pianist, I have been trained to have a very supple hand position when operating any device but I notice this isn't at all the case for many people I observe in their day to day activities.

Historically probably wasn't much of an issue but given that most people will spend hours at a desk on a keyboard, it's likely to become more of a problem. Think of it akin to paying attention to your posture


The use of Google Now from my bluetooth'd helmet has really improved my motorcycling experience.

Real easy to say: "Okay Google... navigate to California Academy of Sciences."

What's missing for me is spotify/app specific integration.


> What's missing for me is spotify/app specific integration.

For that to really happen in a robust way, I think Google needs to open up Custom Voice Actions.

[0] https://developers.google.com/voice-actions/custom-actions


"Ok Google.... Play <artist> on Spotify" works for me.

I agree discovery of these magic phrases needs work.


Yeah, there's some that can be done through system actions (which I think that is) and it sounds like custom actions have been implemented by selected partners, I just mean they need to open up custom actions to enable more general app-specific integration.


I thought this already worked.

Okay Google... Play music will start Music app Okay Google... Start Radio will start NPR app


I can say "Open Spotify" and it will open the app. Then I have a button on the helmet that sends the Play command. But I can't do anything robust like playing a specific artist.

Perhaps if I used Google Music the integration would be built out.


On my phone "Play <artist>" uses Google Music. "Play <artist> on Spotify" makes it use Spotify.


On my Nexus 6p saying "OK Google play 'artist'" will open Spotify and start playing the top songs of that artist. This does not work to play specific playlists though.


Define work well? It doesn't work well if you're not connected to the Internet, if you speak quickly, if you interrupt it, it can only do limited follow up.


>The problem is that voice interfaces break down pretty quickly once you try to do anything complicated

I've done a fair bit of interface engineering for the web. Between that and using so much software over the course of my life, I'd say that this applies to GUIs just as much as voice interfaces.


Yes, but GUIs have two or three dimensions available (up/down, left/right, time) whereas voice just has the one (time). We humans can also full-duplex GUIs much more easily than voice-based interface. And GUIs at least can be hooked up to full-powered grammar-based interfaces whereas voice, somewhat ironically considering the nature of human communication, has more trouble with it.

(I'd suggest this is actually a combination of the still-non-trivial nature of NLP, combined with a lack of feedback, combined with the fact that giving instructions is quite hard. Humans overestimate human language's ability to communicate clear directions, as anyone who has done tech support over a phone understands.)


Just as the mouse input has evolved to include multitouch and 3d touch gestures, voice input can also evolve. The full range of tone, inflection, pitch, etc is available from the human voice.

I wonder if NLP research should have started as our ancestors did, with grunts and hoots and cries. Instead it's focused on recognizing full words and sentences while almost completely ignoring inflection.

Another dimension to add with vocal input is directional. If you have mics in all corners of a room, which direction you speak in can affect whether "turn off" operates your TV, your lights or your oven.


Very good points. I can't wait until devices can read my emotions or inflections in my voice. I can voice-to-text most of my short messages, but anything that requires punctuation or god forbid emojis still require manual input. And I don't want to have to say "period" or "exclamation mark" to indicate my desired punctuation. If I say it unusually loudly, insert an exclamation mark. If I pause at the end of a sentence (Word has known a grammatically correct sentence for decades) and don't say "um" or "uh", put a period. If my inflection goes up or there is a question word in the sentence, add a question mark.

There is a lot of improvement for voice processing in several dimensions of voice.


And copy and paste. People seem to always forget the power of it. It's the GUI equivalent of "Search for that on Google" or "Now, SSH to this IP I found digging through AWS." Copy and pasting of text from application to application is the clunky Unix Pipe. It's universal and deeply important.

Taking sections of the last response, or hell, even having every response essentially be wrapped up in some sort of object you can reference in your next query to the interface is what all of these lack.

Even Androids "Search this artist" doesn't quite get there. The lack of context between queries is what murders Siri for me. That and her seemingly random selection of what goes to Google and what goes to Wolphram Alpha. Sometimes even the "wolfram" verb prepended to a query just doesn't go to wolfram no matter what.


I've often postulated that copy and paste is perhaps the biggest productivity enhancement in the history of computing.


I know some software maintainers who might disagree. But I like PopClip (https://pilotmoon.com/popclip/) as an enhancement on top of that one.


I second PopClip as a fantastic product, incredibly useful. Their DropShelf[0] tool is also useful, but not nearly as much as PopClip. But definitely worth the money.

0: https://pilotmoon.com/dropshelf/


I use KDE Connect to enable seamless copy and paste between my PC and my phones. It's the single best thing I ever installed in the last 1 or 2 years.


Sure, but the difference is that it's (almost) always obvious what actions are possible in a GUI. With voice interfaces you're back to trial-and-error.


There is still a fundamental problem with voice: it has to understand your words.

A text field in contrast doesn't need any intelligence, nor do buttons. This is in particular important for instance for people living in non english speaking countries but using english in specific contexts (work, gaming, minor hobbies etc.). Switching language in audio applications are generally a PITA. Then even when you do the switch between languages every time, the engines are still have huge performance gaps between the languages.

Sofware has become way extremely tolerant for multiple languages IMO. Voice recognition interfaces are not so mature yet in my experience.


I'm not so sure about that. Check this out. One of the toughest fights in one of the toughest games performed with only voice commands. https://www.youtube.com/watch?v=5m2a2dLdZ0M

Now, granted, this is a specific use case, but, you know... "explore the space" and all that. (more cowbell!)


> One of the toughest fights in one of the toughest games performed with only voice commands. https://www.youtube.com/watch?v=5m2a2dLdZ0M

After 111 failed attempts :)

Still, it's a hell of an achievement.

EDIT: to be fair, Ornstein & Smough is a very tough fight even with normal controls.

Also notice the voice recognition fails to recognise some words like "item" even though they are spoken clearly. Almost gets the guy killed at one point.


The "play some good 60s rock" example isn't a VUI breakdown, it's a functionality gap in the backend. One that will probably be fixed pretty quickly, given the way things are headed.

A VUI breakdown would be inability to understand accents, or non-responsiveness to commands. As a user input, Alexa is pretty well buttoned up.


Sounds like the Enterprise computer:

Geordi: Computer, subdued lighting.

(computer turns the lights off)

Geordi No, that's... that's too much. I don't want it dark. I want it cozy.

Computer: Please state your request in precise candlepower.

(The scene: https://www.youtube.com/watch?v=OPZnR3Ue1n4)


There will certainly be some aspects of the computer training the human, too. Just using this as an example, I don't know how much candlepower I want, but computers don't get bored or annoyed by my requests. I could start with 1 candlepower and move up to 10 if it's not bright enough. 100 might be too bright, so now I know what range I'm looking at. Next time I could just say "computer, 12 candlepower lighting, please".

Computers train users on how to use the computer all the time. It's less ideal than having the computer know everything, but once you know what you can expect from a computer, it's easier to get a good result.


I think that cuts both ways. If the computer can be trained to understand the user's intent, that seems like a better solution than forcing the user to think a different way.

Which would you rather do? Be forced to state your lighting preferences in candlepower, or have the computer learn that when you say "subdued lighting", you mean "12"?


Very true, but this is one simple example. Look at what Wolfram Alpha tries to do for even more complicated examples. If I put in "if I am traveling at 60 miles per hour how many hours does it take to go one hundred miles" it gives me an answer of 6000 seconds (1.66 hours). Very intuitive, and it actually ruined my example because I did not expect the site to understand what I was saying.

But if I type in "how fast do I need to go to travel 100 miles in 6000 seconds", now it has no idea what I'm talking about and instead gives me a comparison of time from 6000 seconds to the half life of uranium-241.

Now, when I get that result, I don't usually just give up on trying to figure out the answer. Instead I try to figure out what the computer expects me to say. Through some trial and error, I can shorten the query to "100 miles in 6000 seconds" and boom, I get the answer of 60 miles per hour. Instead of natural language, I'm using the search engine like a calculator.

The computer has just taught me how to use it. Ideal? No, but we work within the reality we're given. 12 candlepower is dim for you but for someone with decreased vision, that might be completely dark. The computer doesn't know unless it's taught, and we know from looking at history that users would rather the computer train the user than the user having to train the computer.


You asked: "how fast do I need to go to travel 100 miles in 6000 seconds" Which is equivilent to saying "at what rate do I need to go to travel {rate}". It's a nonsense question, you already know the answer. You need to go 100 miles per 6000 seconds.

What you should have asked is: "100 miles per 6000 second to miles per hour", which it will happily convert the rate you gave, for the one you really wanted.

I guess what your saying is it should be able to figure that out, but at some point, the old phrase "garbage in garbage out" surfaces.. You never told it to convert the unit.


Wolfram is, and has always been, much more inclined to understand you if you work out what exactly you are trying to calculate before hand.

Some phrases exist as a "wow, 1 million people phrase this problem this way, let's throw that in." The fact it can take an easily dictated, albeit strictly phrased problem, and get you your answer is really what I love about it. Now if Siri would just stop sending stuff to Google. -_-


What if you could define the equivalent of Bash aliases via voice control? This would allow users to tailor their experience from the default (possibly complex/unintuitive) commands to their own personalized ones.

Example format: "Computer, define X as Y"

"Computer, define subdued lighting as set lighting to candle power twelve"

Then the VUI just adds a new entry to the voice commands where saying X results in Y.


So unrealistic. They'd use candelas.


You're thinking too much like an engineer :-) It's not a speech recognition breakdown but it's certainly a voice interface breakdown in the sense of I can't get the device to do what I want it to do. As a user, I don't care where in the pipeline my attempts to communicate a desired action break down. I just know that they do.


Exactly. We're used to dealing with either humans, who are intuitive and highly adaptive, and technology, which we manipulate and have total control over (so long as the system displays its status, we can find our way). We're not used to systems that expect us to interact with them in natural language, but have very specific criteria around what we ask for.

It still feels a lot like the old text-based RPGs, in that you spend most of your time trying to figure out how to phrase something to accomplish a basic need, while angrily thinking "it would have just been easier/faster to pick up my phone."

It's 2016. How are we still OK with the unreasonable constraints of technology that make us jump through a hoop like a trained poodle to get the treat?


Same can be said for GUI as well. Remove the search engine concept, you are only left with playlist, song/artist name on such sites.

We don't have audio search engine equivalent yet but that day is also not far.


That's the thing. It is a use case with voice commands that map to specific actions. In the case of music, I can give Echo the name of a specific artist or maybe a playlist. But it breaks down pretty quickly if I tell it to play "some good 60s rock."


Ok, that is pretty damn cool. I've played Dark Souls so I can appreciate how difficult that must have been. Very impressive.

Devil's advocate though: this seems more like a case of the guy being good enough at the game to win in spite of the voice controls rather than because of them. Compared to a regular controller/keyboard+mouse/whatever there's just no contest in terms of input speed and precision. Not all genres are a good fit for this either. I'd be really interested to see if anyone could make it work with, say, a competitive FPS game.


Never mind that in order to use a voice service, it requires you to speak at a rate slower than many can type, all while demanding that the people in the room hush up so it won't get confused. Repeat if there was a mistake.


Try Hound. It's faster than anything I've tried and it's context management is just impressive as hell. The echos lack of negative clauses is really really frustrating.


I just can't stand talking to a computer. Never liked the idea of it. I loathe voice-controlled telephone menus. I can type faster than I can talk (if you include the inevitable revisions -- even without it's pretty close). I don't even like to leave messages on voicemail. I don't think voice interfaces are anything I will ever use if there's another option.


That holds true with pretty much all first generation products of it's type. The first "smart phones" couldn't do a whole lot of things. Over time, the Echo will improve and you'll be able to hold conversations with it.


My children are quite young. The world is going to be an amazingly interesting place when they are my age.

I can recall the first time I ever saw a computer and how primitive they now look.

Now we have little bots that listen to you and reply with info.

When my two-year-old is forty - we will have ghost in the shell.

It's crazy beautiful and scary to me that we all grew up reading cyberpunk fiction and watching anime and not all of us did, but pretty much all of us are actually building that future.

There is a balance between dystopia and utopia though.

We are all working at the Great Game - and the future is going to be interesting, but we can never turn back. So hopefully we keep the balance and get it right.

My worry is that at this literal nascent stage of technology, that we don't fuck it up as we don't fight hard enough for privacy policy.

We need privacy policy that is thinking at least 50 years in advance.

The control of government apparatus is thinking in advance - I personally feel that the tech sector's vision is myopically focused on today's profits and not in the future where it should be viewing, with the exception of this most recent case between apple and the FBI. At least Cook's comments were salient and forward thinking and truly for the greater good... Let's hope that invigorates the tech industry as a whole to think about where we are headed.


Speech recognition has improved dramatically over the past few years through using cloud back-ends. It's actually usable for many tasks.

However, we seem to still be pretty far from natural language interfaces that make sensible inferences about actions you're requesting and perhaps join multiple data sources to answer your query. There have been a lot of advances--don't get me wrong. But it's a very hard problem that's been being worked on for a very long time.


Just like you hold conversations with Siri, Cortana and Google Now?


Are they not first-gen?


Well, I mean, they aren't fixed artifacts like a piece of hardware. I'm pretty sure they have been updated a few times.


is it better than google voice? Siri is completely useless for me but google voice recognize everything I said (love my new iphone 6s but I wish I could say "hey siri" and it would actually work).


The other issue is it becomes less useful when more than one person is active in the room. Small party? Interface no longer functioning as talking in the background interferes.


And if you do get beyond a narrow range does the user spend a lot of time thinking about how to craft a question so that the machine can understand it?


How complicated is controlling a TV or a radio? And voice is much easier for a variety of tasks than remote controls.


I think the main problem with voice interfaces is that it's not discoverable. You need a good understanding of what the system can and cannot do, its current state etc before even speaking.

CLI has the same issue, but at least you can man-xxx, which I imagine works a lot better in text than it does in audio.


I think the goal is that the system gets to be good enough that nobody worries about discoverability any more.

I think Google is quickly getting there with their search interface. I'm always amazed at what a good job Google does when I ask it a question like "what's the name of the instrument powered by steam" and milliseconds later it's showing me info about calliopes.


I really liked how this was done in the movie 'Her.' There's something especially nice about only having your attention distracted audibly and not visually, especially in public.

I wonder if the smartphone age will go away as quickly as it came. I picture a world where we just have smart wearables like a watch which has a tiny visual interface, but a powerful audio one (speaker, earpiece, put watch up to ear, etc). It seems a lot less intrusive. I imagine as we get better with AI and voice recognition, it'll be as practical as a phone. What I'm able to do with Google Now on my watch is fairly impressive today. We already have the technology to understand things in context like "Navigate to Katz's deli" brings up Google Maps to the deli as opposed to a google search results page about navigating to a cat themed deli, which was the status quo not too long ago with voice search.

I imagine carrying around this big selfie/facebook machine around, constantly charging it, whipping it out all the time, etc will be pretty gauche if wearable-only solutions become competitive.


For many functional tasks, I can see an auditory UI being superior. But currently most people use their smartphone to skim content. I don't want the equivalent of listening to voicemail for everything.

Not to say that content can't shift for the medium, just as it always does. What would an audio Facebook sound like?


Well, I do that now sorta on my watch with its small screen. I scroll through notifications, but no, I don't get the full FB web or mobile experience. I'm not sure how many people actually want that; I often hear complaints about how phones and apps aren't simple anymore. I also believe that we really haven't figured out the best way to use these small screens. I'm surprised at how usable my watch is sometimes with its 320x320 screen at 1.8". For reference the original iphone was 3.5" at 320x480 resolution.

For teens and such I can see the big phone never going away but for most adults, having an inconspicuous wearable just seems like a more refined experience. I imagine there's a logical procession here from desktop > traditional laptop > ultrabook laptop/convertible > tablet > mobile > wearable. You lose functionality with every step, but depending on the use case, it doesn't really matter. For people in my peer group, a wearable that could work without a phone would sell like hotcakes.


> The transition from primarily visual UX towards an auditorial UX is really powerful.

It's also less accessible. I'm sure auditory UI is useful in many cases, but it also seems to be more cumbersome in others. In any case, I hope that pervasive auditory UI doesn't become any sort of standard without an accompanying visual/physical interface.

> Try watching a scary movie with the sound turned off, it turns into a comedy

Allow me to be pedantic and say it is that being fully immersed in the context of the movie that really matters. You could probably achieve a similar suspenseful effect with silence+subtitles, although I'm sure the experience isn't identical. Otherwise, the deaf could never enjoy scary movies, including me.


>It's also less accessible.

For whom? To the blind this would be a godsend. From a practical medical perspective, audio is superior because we have decades of experience with effective ear implants to help the hard of hearing and the deaf, but the visual equivalent still eludes us.


> To the blind this would be a godsend.

Actually, I'd imagine that a good old-fashioned tty is pretty good for a blind person: it's TUIs and GUIs that get progressively more painful.

Source: am blind without my glasses; can imagine preferring ed to emacs, vim, Atom, SublimeText if I had to use an audio interface.


> To the blind this would be a godsend

For sure. Different interfaces disadvantage different classes of people. There is no silver bullet; I'm trying to point out that an exclusively audio/voice-driven UI would not be desirable.

> we have decades of experience with effective ear implants

The problem is multi-faceted. Hearing loss, especially from a young age, often leads to difficulty speaking -- it is no use if a voice-driven system can't understand you in the first place.

And while cochlear implant technology has helped a lot of people, it is by no means a cure, and there are many, many others that don't benefit enough from assistive technology to achieve functional equivalence (which is the key phrase when talking about accessibility). I have a cochlear implant and haven't worn it in years, because it really doesn't help.


> It's also less accessible

Well, I think blind people would disagree with you.

> I hope that pervasive auditory UI doesn't become any sort of standard without an accompanying visual/physical interface.

Any speech interface could be trivially translated to a text interface, right?


> Well, I think blind people would disagree with you.

Answered downthread.

> Any speech interface could be trivially translated to a text interface, right?

Pretty much, which is why UIs should not be exclusively auditory, that is, delivered without an accompanying visual interface (text or otherwise). Ordering the Echo Dot verbally is a cute gimmick given its premise, but it would really suck if otherwise useful products and services were only usable through audio.

Hopefully the audio UI trend does not follow the obsession over touch screens: a rapidly adopted, de facto standard driven by tastemakers that leave little consideration for others that might prefer an actual keyboard or other physical affordances.


> Allow me to be pedantic and say it is that being fully immersed in the context of the movie that really matters.

I hope to not be a super pedantic ass for pointing out that the 'immersive' media in films is the audio, not the visual components.


> the 'immersive' media in films is the audio, not the visual components

That's a non-falsifiable opinion, really (even if it does apply to the majority of the population). I'm living proof you can enjoy movies without the audio.

It's the sum of our experience that colors our perception -- almost irrevocably in this case, since I imagine it would be difficult for the typical person to really be able to enjoy something in complete and utter silence.


> I'm living proof you can enjoy movies without the audio.

I am not looking to equate immersion with enjoyment, and by no means do I intend to disrespect the manner by which you enjoy a type of media. My apologies for coming off that way!

When I refer to 'immersive media' I am referring to the 360-degree omnidirectional dispersion pattern of sounds and our similarly omnidirectional hearing of those sounds. This is 'immersive experience' as opposed to a 2-dimensional or stereoscopic experience, which is what we get with visual media. Television/film screens fire light directly at the eyes; even in iMax situations the film is never experienced behind us. That isn't immersive, whereas say a VR headset can potentially offer this type of immersion. But since this technology is still in its infancy I think it too early to call it fully immersive like audio is.


> 'immersive media' I am referring to the 360-degree omnidirectional dispersion pattern

Then that is splitting hairs over a definition of immersion, and quite unrelated to how the word was used in my original comment. Had I instead said "fully engrossed," my point would still hold, and you would not have one.

I understand you were being "super pedantic," but if you're going to do that, then you should be super precise in the pedantry, otherwise you're arguing a strawman.


>> auditorial

Don't you mean oral or aural?


You would probably be interested in what we've been building over at https://www.narro.co.


It would be nice if it could extract forum discussions, like YC and Reddit. Sometimes I like to hear the text I am reading, it helps with concentration.


Yes, I'd like to see the possibility to select text, right click and select "read out loud".


I think all the browsers on OS X support that using the system text-to-speech (edit: Safari and Chrome, not Firefox)


I'm using Linux. It seems that Linux is falling behind in the area of speech input/output. I hope they will catch up.


Voice will become an important, if not the primary, interface to home/car audio/video.


"computer lights on" "dimmer"

no thank you.. i will use my hand


A device to change the channel on my TV? No thanks; I'll just use the dial on the TV.


let me pick up my phone, open the app for light control, dial in some setting, and hope the app doesn't crash.

TV remotes are awesome because it has physical buttons, and it's fairly dumb... almost no chance of issues.


And if you're on the couch watching a movie and the light switch is on the other side of the room? Or you want to switch on the porch light for guests. Or switch off outside lights?


i get my non-lazy ass up.

For the few times where I may need to walk additionally around the house, it's a non issue


and what if you weren't so mobile?


They should announce Amazon echo for the deaf, which would just be a screen.


... with a couple of kinect type devices to monitor one's signs.


"If you have more than one Echo or Echo Dot, you can set a different wake word for each".

This is something I've been thinking is becoming more problematic as well as an opportunity for real ubiquity. I have 3 separate devices nearby that are Google Now voice activated (the newer devices support this even if the screen is off), and they will sometimes trigger at the same time accidentally.

Since the processing is cloud based, and they know my identity, why don't the devices recognize this fact and cooperate. Instead of just 7 beam forming mics in the Echo, if you have two within hearing distance you could have the benefit of 14 and a unified response. Don't tie the request & response to a particular device, instead think of it as ubiquitous network that moves with you as you walk around the household, you should be able to continue your conversation from one room to the next seamlessly.


Since the processing is cloud based, and they know my identity, why don't the devices recognize this fact and cooperate. Instead of just 7 beam forming mics in the Echo, if you have two within hearing distance you could have the benefit of 14 and a unified response.

The echo and noise reduction software that I'm aware of can't really do that in a reasonable fashion.

With current solutions, you've got one DSP that's receiving all the audio streams simultaneously, and they need to be exactly synchronized in time. Then, using basically pattern-matching, it figures out what direction the user's voice is coming from, and combines some/all of the audio streams together to eliminate environmental noise and make the speech as clear as possible.

To do this with separate devices, you'd want extremely precise time synchronization. Which is possible, but I wouldn't want to implement it.

The extra processing and synchronization would take longer, and delay input to the speech recognition engine. I don't think it would enhance the user experience.

Edit: spelling.


Just have the Echo that hears the person best be the one that responds. So simple, and easy to implement. I honestly don't understand why Amazon hasn't fixed this yet. It's so fucking obvious.


> So simple, and easy to implement.

Ah yes, the rally cry of the person not doing the actual development work... In my experience, rarely is _anything_ "So simple, and easy to implement".


Agreed. Doing something sensible at a higher level than the actual audio recording would be easily possible.


> I don't think it would enhance the user experience.

Baidu trains the voice recognizer by adding all kinds of noise to the training data. I think it might be easier to do that than use multiple microphones. The neural net learns to do the difficult process of separation of useful data from noise.


I learned to not have the wake word be "Amazon" when I was watching online training for AWS. The Echo went nuts until I finally paused everything and changed the wake word back to "Alexa".


They really need to make it so that all of the Amazon Echos on the same network use a proximity algorithm to determine which one responds. Simply: The Echo that hears you best should be the one to respond.

I want to have an Echo in every room, and I don't want to have to remember all their different names!


> I have 3 separate devices nearby that are Google Now voice activated (the newer devices support this even if the screen is off), and they will sometimes trigger at the same time accidentally.

> Since the processing is cloud based, and they know my identity,

Interesting, so everything said in that room gets processed and potentially sent to Google for indefinite storage? What a 1984-style luxury.


AFAIK, every one of these devices does nothing until a "wake word" is heard, and only then do they record+send.

Having all of the devices listening all the time would be a bandwidth and power nightmare, if not for the sender, for the receiver.


Correct, of course it can activate accidentally.

https://history.google.com/history/audio has a list of all audio recorded


What about accidental triggers of the wake word? What about planted "wake words" to record people discussion "inappropriate" things?


For the Echo, at least, it has to use your home network, so you could pretty easily run a packet capture to see if it's ever sending audio out when you don't want it to.

Harder for things with cellular data, though.


It's a cat-and-mouse game: What if it only sent the clandestine information when it picks up the "normal" word? The point is you don't control the device or its software.


>What a 1984-style luxury

exactly why I think none of this is worth it (echo, google now, siri, smart tvs etc) - especially given the current applications of the 3rd party doctrine - you are giving up the right to privacy for everything that is said in your home.


I agree with this entirely. I've been waiting patiently for a way to add microphone distance to my Echo and this is perfect for that... except it doesn't work that way.

I am very much hoping they fix it in the future and add a software layer to combine/route commands with one single wake word.


It's also a bit annoying that the Android Wear version of Now doesn't work the same as the regular Android version. For example, the full-sized one seems much more flexible with wording, and supports listening in several languages at once, while Wear is limited to one language.


But it's limited to 3 words which is weird. I'd rather three "Office, order socks" than try to remember that the one in my office goes by "Amazon".


When did I turn from the enthusiastic kid who dreamed of audio-controlled personal assistants like this to a cranky old man who doesn't want anything remotely spy-possible in his house?


I think when we were kids we didn't think that the personal assistant would have to communicate with the outside world via the internet in order to perform its function.

If all of "Alexa" was included in a disconnected local database I bet it would still be as appealing.

Rosie on the Jetsons didn't have to "phone home".


Almost. More specifically, we didn't think that the personal assistant would have to communicate with a corporation that wants our info to make (more) money. The government option doesn't sound any better, either.

I think they are rather creepy, because it's so obvious there is (or, could be) a hidden agenda.


While indeed creepy, I ordered the original Echo as soon as it was made available, but I'm probably a special case. I live by myself and barely even speak out loud at home.

If Amazon can somehow monetize my primary use of Echo as a glorified kitchen timer I will be impressed.


> I live by myself and barely even speak out loud at home.

It occurs to me that the background noise in your home actually reveals a whole lot about the self:

- What you're listening to and when

- What you're watching and when

- What type of gentleman's material you enjoy and when

- When you leave home and get home

- When you wake up, when you go to bed

Some of these can be limited by the size of your house, but the trend in urban dwellings has been towards smaller so one unit could presumably capture every sound in your home.


And at the end someone pays money for some company to install a device to collect all this.

The tech insanity has really gone far ..


Finally the Telescreen is here.


Your first three points are moot in my case because, as a testament to your mentioned small apartment size, I consume all my entertainment with headphones after some real passive aggressive comments from neighbors a few years back.

When I wake, sleep, leave, and come home could be monitored by Echo, but it's also already being monitored by other devices I own, and it's data I'm not particularly concerned about at the moment.


> I live by myself and barely even speak out loud at home.

Not sure how other people feel about talking out loud at home, but as someone who also lives alone (in a 250 sqft apartment) and always wears headphones, I can't really imagine talking out loud. Just seems weird for some reason. I never use Siri either.

Wonder if that's a living alone thing, or a small apartment thing, or ...?


$180 for a kitchen timer seems a bit steep.


I would pay $180 for a voice-controlled kitchen timer which did not need an Internet connection to function and had verifiably secure command log deletion.

I'm less than enthusiastic about a $180 kitchen timer that uploads everything I say to the cloud for analysis, even if I understand that the analysis is to some degree necessary to improve the voice recognition.


While I hear what you are saying (no pun intended), it's important to be clear that it is not uploading everything you say to the cloud. It's uploading what you say once it wakes up by detecting the wake word, which is done completely locally.


It was $99 (there was a special offer when it was first announced at the end of 2014).


$99 for a kitchen timer seems a bit steep.


Considering most smartphones already have this app on them - I'm going to agree.

FREE vs. $99? No contest there my friend


We all spend our money how we want, and cell phones most certainly aren't free, either.

Aside from that, I didn't purchase the Echo with the intent of it being primarily kitchen timer. It just so happens that after owning it for over a year my usage of it is mostly limited to that.

My usage is probably around 85% timers and alarms, 10% streaming music, 4% shopping lists, and 1% everything else.


I'd be interested to find out how much you still use it a year from now.

Do you think you've used like you thought you would, or did you have ideas about how you might use and those didn't pan out or the device didn't work very well for those?


I ordered it originally purely on the "Oh, cool gadget!" factor, and I was willing to part with $99 for it.

I really didn't have a particular use case in mind at the start, but I was (and still am) impressed by the sound quality from such a small speaker. It's nice to be looking in the fridge and say "Alexa add X to my shopping list" or when my hands are covered with flour say "Alexa set a timer for 30 minutes" or whatever. And for those things it's worth the cost to me.

Most of the features that have rolled out just seem gimmicky, though. Take the news briefing: It either provides too little info to be useful, or it drones on and I get annoyed by the voice which, while it sounds natural compared to Microsoft Sam, still feels cold and artificial. In general I like having more control over my internet actions. I'll never use it to order a pizza or anything from Amazon because I don't know what happens if it misinterprets me or I make a mistake. And the third party apps are clunky ("Alexa, ask X to do Y").

To sum it up, aside from the very basic features I've used since day one it just feels like a toy.


Basically everything B2C today is a data play. Customers want everything to be cheap or free, so the only way to make money in B2C is to turn the customer into the product.

It's a deflationary race to the bottom. The bottom is a hell where everything watches you and sells absolutely everything about you to whomever can afford to buy the data.


Whenever I read these, I can't tell if the group is paranoid or prescient. But anyway I ordered one via my alexa. Amazon probably already knew I would.


> I think when we were kids we didn't think that the personal assistant would have to communicate with the outside world via the internet in order to perform its function.

Human personal assistants were connected to the outside world -- how else would they make appointments and reservations, book flights, find out what the weather would be, etc.? The whole point is to be connected to the outside world, automatic or no.


> Human personal assistants were connected to the outside world...

There's a difference between the "always on" communication these devices have and communication the user specifically requests.

When I want to make an airline reservation, I'm requesting the device to send the booking information to the airline. I'm not asking it to send a recording to the mothership of everything that happened in my home for the last 5 hours, which a human assistant would never do.


Hah, it'd be like hiring a personal assistant from a staffing agency who is constantly on the phone with the staffing agency parroting what you say.


That's also not what's happening with Echo. You'd literally have a few seconds of audio being sent to Amazon and then some text (the result of the ASR) being sent to the third party ticket search / reservation system.


Sure, but I also wouldn't let a human assistant live in my bedroom 24/7 listening to everything I say. I would also choose my words and topic differently when a human assistant is around.

You have to be able to trust that Echo isn't recording everything you say, unless you prefix it with "Alexa", and that this behavior will never change (say this is the behavior for the average user, but with a police warrant, they're able to tap your Echo).

I'm part of the group that thinks the tradeoff is worth it for the convenience, but I understand why many people would disagree.


This is exactly it for me. I'd buy an echo and a dot for every room if it didn't phone home.


I wonder what sort of memory related tech it would take to pack nearly all of the internet in a small space, and have it incrementally update(the internet!) and yet write it in available memory.

Besides any contact with outside world would need communication. So you can't have an entirely standalone gadget.


Minus videos and images over a certain size... not all that much. And it would compress pretty well.

I wonder if the internet archive has a record of the size required minus images.


Couldn't legally the FBI get a court oder to be able to listen in on conversions in a room that has one of these? They already do that with car assistance services. [1]

[1] http://www.cnet.com/news/court-to-fbi-no-spying-on-in-car-co...


Echo (supposedly) doesn't start sending audio to Amazon until you trigger it with a "wake word", i.e. "Alexa".

Of course:

a) it's not open source so we can't be sure (aside from monitoring network traffic, which is probably encrypted)

b) if the FBI is successful in compelling Apple to develop a backdoor for the iPhone there's nothing stopping them from compelling Amazon to do the same with Echo.

c) better hope you don't say "Alexa" or something Echo mistakes for it.


The traffic is encrypted. But you could certainly watch the network traffic and see that there's no traffic if the Echo doesn't wake and the lights don't turn on. (Of course, you'd have to trust that it isn't time delayed for hours in some sort of intentionally-sneaky way.)

It would also be possible to take a look at the hardware design and determine the linkage between the "mic mute" button light being on and power going to the mics.

The customer can set the device to provide both audio and visual indication when it "wakes up" and begins streaming to the cloud. And, of course, the customer can also press the mic mute button to avoid accidental wake up.

Yes, the FBI could try the same approach with Amazon as they are trying with Apple. For all of our sake, let's hope that Apple wins.


> It would also be possible to take a look at the hardware design and determine the linkage between the "mic mute" button light being on and power going to the mics.

How would the mics listen for the wake word if they aren't always on?


There is a mic mute button that is able to turn off the mics, which then prevents the device from waking up, as it is not receiving audio signals to process and detect the wake word. When the button is activated (== the mics are off), there is a glowing red light illuminated inside the button.

My point was that you could check to see if the linkage between that red indicator light and the power going to the mics was in software or hardware.

This is analogous to the warning light that many laptops have for when the built-in webcam is on.


b) if the FBI is successful in compelling Apple to develop a backdoor for the iPhone there's nothing stopping them from compelling Amazon to do the same with Echo.

No backdoor needed if they information is sent to Amazon. All that is needed is a court order for Amazon to hand it over.


Sure, but all you'd get are commands you give Alexa ("Alexa, turn off the lights", "Alexa, what's the weather today"), which I suppose could be interesting to law enforcement, but certainly not as interesting as the "full-take" of an always-on wiretap.

I'm suggesting in order for the FBI to use Echo (or any other internet connected device that has a microphone) as a wiretap, the FBI could try to compel the manufacturer to write, sign, and push an update that causes the device to transmit audio to the FBI at any point.

That would have seemed a little far fetched in the past, but the current FBI/Apple situation could set a precedent.


The answer is obviously 'yes'. If there is a way for Amazon to listen to conversations then a court can compel them to give the FBI access.


I'm not terribly worried about various ways companies expose me to govt surveillance that requires a court order.

I do worry about said court orders being rubber stamps, and about surveillance that DOESNT require a court order.

Otherwise we can make no technological advancement.


I love "smart" devices, but hate "devices that needlessly insist on connecting to the Internet".

One of the worst offenders is Dropcam. They have a super camera, easy to set up and use. Great picture quality. Would be an awesome baby monitor or "closed circuit TV replacement". But why the goddamn hell does it need to connect to the Internet? Why is the only option available to needlessly stream video out of my home network to the cloud, only so that I can then stream it back into my home network for viewing??? WTF? That's both a waste of outbound bandwidth and a waste of inbound bandwidth. I should be able to put it on my network, switch off the cable modem, and still be able to view video locally. How hard is that? I could do that with a webcam and a really long USB cable!


Their business model depends on some percentage of their customers using the subscription service.

My guess is: if they offered the version you describe, they'd need to make it much more expensive. Which many consumers would find odd: the one with fewer features would cost much more. Granted, those consumers wouldn't be looking at the big picture...but I find many consumers don't. Up front costs matter a lot to consumers.


As dumb as it sounds, it is probably easier that way. Sometimes in LANs it is easier to get data out then back in. For example, a lot of dorm networks don't support Chromecast devices because chromecasts tries to multicast on the LAN for discovery, but dorms have networking policies that prevent this.

A webcam that sends the data out to the internet then back would avoid the discovery issue by using an external webserver as a rendezvous point.

I don't think people spend a lot of time thinking about their home networking. You could imagine most people just plug in their home routers and it is a crapshoot whether or not the router will support the necessary functionality, whereas a router will always enable communication to the outside world (or people would return it ASAP).

With that said, this seems like a straightforward technical problem that may have technical solutions.


Ease of setup for regular Jane/Joe because they know shit all about router configuration. That's why devices just transfer everything over someone else computer a.k.a. "teh cloud".


If you don't care about recording video or video recognition features, the cheapo chinese cams on amazon actually perform pretty well. For $80 you can get 720p video with IR lights, speakers, microphone & it can move around. Usually it doesn't zoom like a dropcam can.

If you willing configure a NAS server somewhere, you can even record the video locally.


The video quality probably doesn't compare but I've used an old iPhone with iPCamera (i'm sure Android equivalents exist) installed for this purpose, which simply hosts an mjpeg stream at a local IP address. It should be simple to start or stop recording the stream on any device that's connected.


Alexa probably uses forms of machine learning and also queries lots of services to find the answers you need. Also it learns from every user and gets better for every user this way. That would be really hard to do with an offline device.


Yes, that is exactly how it works.

If you, as a customer, want to, you can go to Amazon.com and delete all your voice history (or any single interaction).


This is probably a function of the amount of bad news you've read over the years about people getting exploited, taken advantage of, spied on, etc. When you're a kid it doesn't even really seem like a thing.


When you're a kid, you generally assume people around you are all wonderful.

... Then you gain life experience.

/75% jokingly


We'll be dead soon. Enjoy the little things.


Nice try NSA.


Yet I'm guessing you carry a smart phone in your pocket almost everywhere you go.


But phones don't have a microphone on them do they? :-)


Uhm, what?


Sorry, I thought the smiley face would've been enough to give away the sarcasm. For some reason the /s felt like it removed the infinitesimal amount of comedy from my post.


(should I tell him, guys?)


Don't forget the harried parent of a child with low impulse control.

These things would be a lot less "Big Brother" for me if I had a mic key in my pocket that would only turn the mic on when I squeezed it.


Hey buddy want to bet that Amazon is using this massive collection of voice to text to sell to other companies like Apple and Google?


Riight, because Siri doesn't generate enough voice data for apple.


The enthusiastic kid would probably get distracted and discouraged when X can not do "What I really want, like Ironman." While the "cranky" old man has been mis-characterized as "cranky" because "cranky" is often confused with wisdom and experience.


When you realized that the government was making an all out assault on the most fundamental American rights and the civilian sector did absolutely nothing to assure your privacy and anonymity out of sheer greed and narrow minded foolishness that they would be undermining their own success.

I am sure you would not have a problem using these kinds of systems if it were assured that you could not be tracked or monitored because the devices and systems were secured in overlapping ways.


In hindsight it all sounds amazing and ignoring the spy-possibilties, it gets old fast. I don't use Siri, and I can do alot of this with it. But since I got the first Siri enabled device, I've used it mostly just for joking around and my daughter asks her hockey scores. That's the extent of it.


Because when we dreamed of this as kids, the thought of the corporations behind these technologies that harvest our data for their gain didn't come up.


Exactly when did UnconventionalButTotallyLogical = CrankyOldMan ?


The moment you clamored for MIT embedded Linux software and the "let's kill all the GPL it's bad for startups" meme came up.

So now this cool audio controlled personal assistant is just another gadget to buy more stuff from Amazon, instead of something you control.


Is this voice recognition stuff based on MIT-licensed open source speech recognition? I have a project that would benefit from good quality speech recognition.


Well, no, that's the other thing: "let's put everything in the cloud so nobody owns anything anymore!".

Voice recognition is done on some Amazon server. If it goes down or changes API in five years, it will render this thing a brick.


There is something delightfully ballsy about making this only available to users of Alexa Voice shopping:

"Echo Dot is available in limited quantities and exclusively for Prime members through Alexa Voice Shopping. To order your Echo Dot, use your Amazon Echo or Amazon Fire TV and just ask: "Alexa, order an Echo dot"

Also, this makes me sad. I'd kind of like to try this out, but I have no Alexa voice service currently (I don't think)


Even though I own an echo, I wanted to get in early. Here's a link until they remove it: http://www.amazon.com/gp/offer-listing/B00VKTZFB4/


That link still works for ordering :)


I don't imagine it will be like that forever. It's just a clever way to limit demand until they can ramp up manufacturing, or work out the bugs, or whatever their motivation is for keeping it in a limited release for now.


... and also a way to introduce the concept of shopping via Alexa (I would imagine one of AMZN's primary long term goals for the project)


Actually you can already shop via Echo/Alexa today. It's effectively limited to reorders and music for now.


I think it needs a base Amazon echo to work if I understand correctly.


No, it needs external speakers, unlike the original Echo. However, you only need an Echo to preorder a Dot, you don't need an Echo for a Dot to work.


FTA: Includes a built-in speaker so it can work on its own


Built in speaker is for alarms, not media, I think.


it does seem to exclude media.

> Built-in speaker for voice feedback when not connected to external speakers > Includes a built-in speaker so it can work on its own as a smart alarm clock in the bedroom, an assistant in the kitchen, or anywhere you might want a voice-controlled computer


That's crazy, why do I want this without a speaker? The bluetooth speakers they recommend are all really expensive; a speaker + Echo Dot is more expensive than a regular Echo... why wouldn't I just get a second Echo?


You can plug it into a hifi system.


The speaker is for voice feedback only. Doesn't actually support music, news, audiobooks etc.


Do you have a source for that? I have an Echo in my living room, but I was thinking of picking up one of these for my bedroom. I don't really care about sound quality as I would just be using it for Philips Hue, weather, and news.


Sure, this was the link that was emailed to me from Amazon, which also included the following text:

> With its built-in speaker, you can place Dot in the bedroom and use it as a smart alarm clock that can also turn off your lights, or use Dot in the kitchen to easily set timers and add items to your shopping list using just your voice

http://www.amazon.com/b?ie=UTF8&node=14047587011&ref_=pe_184...

See the technical details:

> Built-in speaker for voice feedback when not connected to external speakers

My Echo news is a mix of Text2Speech and audio, so I'm not sure that it would work for News.


man that would suck. the computing internals of the echo are less impressive than a raspberry pi -0. The dot has bluetooth and apparently wifi to communicate with speakers and network devices. The real benefit of the echo over other homemade voice command devices like jasper(github.com/jasperproject) is the more proprietary far-field speaker array.


> The real benefit of the echo over other homemade voice command devices like jasper(github.com/jasperproject) is the more proprietary far-field speaker array.

Um, and the insane underlying voice API?


No it doesn't


I guess it's time to order an Echo.


Somewhat related, but if I don't subscribe to any of the services listed, this is a pretty useless product for me. I don't listen to internet radio, I don't stream music, I don't order delivery, I don't use uber, there's already 10 million ways to check the weather, and my life isn't busy enough to need a voice-activated calendar.

Is this the future of tech? Like do I need to have some kind of urban-go-getter lifestyle to find use in any of this? When can I get something useful, rather than "thing I already do, but in a new package"?


What would you find useful? You seem upset that a product was designed for a user that is not you, but that doesn't mean it doesn't have a use. Subscribing to music streaming services, ordering delivery, using Uber; these aren't incredibly uncommon things just because you don't use them. It is rare for new and exciting technology to just pop out of nowhere. Almost all new products are reiterations of previous products in new and interesting packages, it's just up to you to decide if it's worth moving to.


Totally fair point! But would you buy an Echo Dot if you only used Uber and didn't use any of the other services? Or if you used 1 or 2 of the services? How many of these services do you need to use before the functionality of Echo becomes apparent?

I want to be a fly-on-the-wall when someone sets one of these up in their home. I can't picture it fitting in with my lifestyle, so I'm curious to see how others would actually use it. Or would it just gather dust and become a conversation piece?


I find it fantastically useful for social gatherings in my small apartment. While cooking we listen to music from the Echo, and have equal control over the music selection (vs "Who has the iPhone? Can you turn it up? Oh, it needs unlocked") and timers for cooking. It could be far more powerful with playlist creation.

After that, it's Uber, schedule, and weather on my way out the door. As I leave I ask it to turn off the lights.

So I use at least 5 of its features (and stream Pandora/NPR on it, so 7?), and find it useful. I don't think I would miss it, but I do find myself wishing for it a bit when I'm at a friend's house that doesn't have one.


we've had an echo for about a year now and we love it.

by 'we', mean my busy family of four. it acts as everything from shopping lists to homework timers to streaming pandora/spotify to telling jokes -- and more. we easily talk to her (she is basically part of the family) a dozen times a day.

i can totally see how someone who doesn't have all this commotion and such would think it useless. for us tho, it's not useless. it's both fun and functional.


Personally, I won one of these in a hackathon, never thought I would use it at all. But I set it up anyways and I found it actually very handy. Give me a news report while im cooking breakfast, timers for things, playing music. I never have used the OK Google / Siri on my phone because if I get my phone out and unlock it I might as well just open the timer app or google the question at that point, but with the echo while im doing something I can just talk and gain information about different things.

Yes you can check the weather a million ways, but those usually require some kind of dedicated screen time, watching tv, loading up a website, checking an app on your phone. Whereas with the echo you just ask it while you are doing something else and it gives you the report.


Sometimes cool new technology just isn't for you.


My problem with Alexa is, I don't want to invest in a new ecosystem. I'm fine with Amazon being the hub that connects all of my services, but I don't want to use Amazon To-Do List, Amazon Prime Radio, Amazon Traffic, Amazon Sports, Amazon Calendar, Amazon Weather.

That being said, they announce partnerships with more and more services every month. Things are looking up.


It's not perfect, but it is linked now with Spotify, ESPN and other publishers, Google Calendar...

More importantly, they have done a good job (leagues better than the competing voice services) of opening their service to developers thru Alexa Skills, which has enabled hundreds of added features including things like ordering an Uber.


I just wish the Skills weren't behind that unnatural syntax.

E.g. Alexa, Ask recipes how do I make an omelet? instead of: Alexa, how do I make an omelet?

I imagine it's to prevent conflicts but I'd like the option to put some services in the default namespace as it were.


I use my Echo exclusively with non-Amazon services: Google Calendar, Spotify, Philips Hue.

So the product is quite open. That being said, the third-party experience could be smoother - it is a minor pain to have to specify "with Spotify" every time I want the Echo to play music.

Overall I'm happy with my purchase, though.


I'd be happy for more full-service ecosystems to choose from. The more that users get fragmented between these ecosystems, the more each ecosystem is incentivized to open up.


I like Echo because (after setup) I don't have to use my phone with it. And at this point, there's little reason not to be in Prime. $99/year is pretty affordable for most of America.


There is an open API, so this situation will improve over time.

Alexa and AWS Lambda are two of the things I'm most interested in these days (disparate, I know) but they're also things without open source equivalent. I'd love to see that change.



There is an api where you can build your own integrations (skills).


Yep. I'm already invested in Apple products, for better or worse. If Apple comes out with something like this for Siri I'll snatch it up. I actually kind of resent the fact that my shopping experience with Amazon has gotten worse in the past couple years while they push their original series, streaming services, and devices (many of which suck).


If it was working with Google Music, I would have bought it in a heartbeat.


Just ordered a Dot -- what is the Tap? They added that to the page, too, but no info. Is it just the next gen Echo?

http://www.theverge.com/2016/3/3/11148776/amazon-echo-tap-sp...

Ahh -- the Tap is a portable device with wifi speaker.

(Probably wouldn't call an audio monitoring box the "tap"


Looks like a portable speaker with Alexa? http://www.amazon.com/Amazon-PW3840KL-Tap/dp/B00VXS8E8S


How much was it? I can't find pricing info anywhere.


What? The price of the Echo Dot ($89.99) is clearly at the top of the article.


Weird; when I click it (I'm on mobile), it takes me to a special part of the Amazon app, and there's no price, and it says it can only be ordered by voice by people with existing Amazon hardware that can do that.


Dot and Tap are two different products.

Dot - $90 external-speaker-port echo.

Tap - $130 battery powered speaker, wifi, Bluetooth with echo, for portable use. (I'll get one if it works great in hotel rooms; otherwise won't.)


Wow, what a coincidence. I just did a setup like this with Amazon Echo and Sonos, by "hacking" the Amazon Echo to do audio-out.

I wrote up a little post on it here: https://medium.com/@MathiasHansen/hacking-an-amazon-echo-and...

Obviously, actually having bluetooth speakers with the Echo Dot is a much better solution, but after using the Sonos setup for 3-4 weeks I must say that it works surprisingly well, and despite the audio hack the sound quality is excellent on my Play 1's.


Meh... the problem with Bluetooth speakers is that many of them don't handle the always-on use-case.

My soundbar would work well, but Alexa would get muzzled every time I turned on the TV to watch something. On the other hand, my portable bluetooth speaker will run out of battery if left on its charger.

The AUX connection is almost a better option, but then am I supposed to leave my amp turned on all the time? There's also the same problem where Alexa loses her voice when I switch the amp over to the Bluray player.


Be forewarned - if I am invited into your home for any reason, and I see an Alexa device, I will vocally add a large shopping list of nonsense to your Amazon cart :)


Just wait until my hot new pop single "Alexa, order more toys" becomes popular with the kids.


I prefer to set alarms for 3 am in the morning. I also have an Echo, so I'm waiting for one of my two victims to retaliate.


"Alexa, order 12 gallons of milk"


55 gallon drum of personal lubricant, please and thank you. Wait, make that 2 drums.


Serious question: is it feasible to implement a kind of loose voice 'fingerprint' to prevent this kind of thing? Will/could Alexa know who's talking to it?


I really want an 'Alexa, stop listenting' command. There's a button on the top that mutes the mic and puts a red ring around them, but when I have people over, it's not a great environment to use voice commands anyways.

'Everyone be quite so I can shout across the room to change my music'


A workaround would be to mute the device itself, and then use the remote (which has its own mic, and works well in noisy environments since you just hold it closer to your mouth).


but then I just have to carry a remote while I'm having a party.


It's not a bad idea. If you're hosting, it lets you change the music without interrupting your guests.


Yes, definitely. It adds complexity in that now you have another source of both Type I and Type II errors (failing to wake up, waking when it shouldn't). Voice ID itself is far from settled science to do well, so it would be a tradeoff.


Theoretically yes you can fingerprint voices. The questions are:

1) Can you do this on the Alexa servers efficiently

2) Do you want to? Seems like setting it up could be a hassle. Right now there is zero friction and it just works.


It's not necessary anyways, as you have to provide a pin to actually order things.


Short answer: yes.


Long answer? i.e. is it possible out of the box or is it possible in principle, if anyone actually builds it?


Alexa, how do I do <terrorist-associated> activity. Oh and please order a mile of bubble wrap.


Incredibly, for an Amazon product, Alexa is terrible at buying things. You can only order things you've bought before, as far as I can tell, and even then, only some things, selected by a filter I don't understand.


Will this be linked together with my echo? One thing I do quite often since my echo is in my kitchen is use it to set a timer. I'd like to be able to go to my office upstairs, and ask it how much time is left. Today, i don't think that's possible even with a second echo.


I have two echoes now. Timers are separate, backend content is synced. You could use the Amazon dev kit to make a universal timer. (That is a good use case)

The Alexa iOS app has a good drop down to manage each device separately.


When I first saw the dot I was very excited thinking it was exactly this, but it appears it is just an echo with a lower quality speaker and a simple cable (and bluetooth) you can connect to your own speakers. A bit of a bummer synced timers and playing the same music through a couple of Echos in different rooms would have been a better use case for me, but perhaps I'm the weird one.


Amazon was the only Big Four company silent on the data privacy lawsuit with Apple. Why would I place one of their always-listening products in my living room?



Thanks for the update.



Only for US customers...

"Requirements

* A U.S. Amazon account

* A U.S. shipping address (50 United States and the District of Columbia only)

* An annual Amazon Prime membership or 30-day Amazon Prime free trial

* A payment method issued by a U.S. bank with a U.S. billing address in your 1-Click settings

* A device with access to the Alexa Voice Service (such as Amazon Echo)"


I'm American. This makes sense, if anyone was going to order awhat seems like a range extender, for a device that just brings you stuff you were too lazy to type, it would be Americans.

Googleglass problem. The interface is me yelling publicly. So not super sure that is going to be adopted well.


I use them in my home. Being able to ask it to set a cooking timer while my hands are full is pretty awesome.

Echo is one of those things where it became magically awesome by being somewhat more accurate than I'd expect. Also, Amazon is updating the service back ends, and it is now extensible.


It's you yelling in your home. It's fine, your dog won't judge you.


I would love to have something similar as open source software. How can I trust this device if I can't examine the code used for hotword recognition?

Also, it would be great to be able to put the software on different hardware - something with digital audio output for example. The concept of Alexa is amazing, but distributing it as properitary software limits its potential.



Thank you, I didn't know about those projects.


They do have something called Alexa Voice Services (AVS), which provides the underlying technology. You could take advantage of that and know that you are only sending along data when you want to.


I'm not entirely clear on the difference between the regular Echo and the Echo Dot. It appears you have to have an original Echo in order to purchase a Dot. Is this simply an extension that proxies all of the requests back to the original Echo?


It's the Echo using your own speaker (it has a tiny one still). The "ordering through your existing FireTV/Echo" is just a stupid marketing ploy. As far as we know right not it does not talk to other Echo's on your network (no proxying/grid/mesh/etc).



Which is probably a link sent to you from your Echo, if you search it on amazon you just get this page http://smile.amazon.com/b/?node=14047587011


I think it's really "this is open beta/supply constraint, this makes sure only people already invested can get it". It's cute but pragmatic

disclaimer: work at Amazon, nowhere near echo


Ah, I didn't notice that it hooks up to existing speakers. I'm curious how many people will use that however. My anecdotal experience is that most of the people that actually own an Echo are fairly tech illiterate and benefit from it being an self contained package, but that may change if the Echo API is extended.


I was just thinking "I want echo with a speaker port" 3 days ago.

The thing I really really want from them is echo in the car.


You could put a Dot in your car. It's USB powered and you can plug it into your stereo with an AUX cable or some such. You would just need to tether your phone's wifi or have some other in-car wifi solution.


Lots of issues with hacking something together like that, though -- I really want something which ties into vehicle sensors, the can handle calls/mute/etc., mixing nav + voice + music dynamically, etc.

At that point it's basically worth building a car computer. Possibly using Alexa for the voice, but if I'm doing a car computer, I think a hack to work with Cortana or Google or Siri might be easier.

It's amazing no one has done a good job of this yet -- it's been within feasible for 10y, and commercially viable for 5y for big companies, or 2-3y for startups just using existing hw. I think it's because not enough good product/dev/etc. people have 1h+ car commutes.


I would be super interested in Echo in my car, but mostly because I don't have one of the fancy new cars that connects to my phone over bluetooth for that sort of stuff.


Me too -- I love my car (06 Audi) but the electronics were designed by a car company in 2000-2004 and thus very far out of date.

The right choice is probably a replacement nav and head unit. I'd rather have an old but nice car with great nav/ ent/etc, than a new car. I don't think I'm that unusual. I'd like to upgrade the electronics every couple years; happy to keep a car for 10-20y. Somehow those should mesh, but don't.


Ford announced an Alexa integration at CES this year.

http://www.geekwire.com/2016/ford-working-on-amazon-echo-int...


>It appears you have to have an original Echo in order to purchase a Dot.

Where do you get that? As far as I can tell, this looks like an Echo with minimal speakers.

[Edit: Ah. It's only available through voice shopping on either an Echo or Fire TV--though that would seem to be a marketing gimmick as opposed to a technical restriction]


> Echo Dot is available in limited quantities and exclusively for Prime members through Alexa Voice Shopping. To order your Echo Dot, use your Amazon Echo or Amazon Fire TV and just ask:

I guess technically you don't have to already have an Echo, but it is only available through voice shopping.



Thanks a lot! I knew somebody on HN would figure out how to order it without an echo.


Yeah, it looks like it's intended to kinda be an expansion to your existing system. So in a larger house where you can't talk to your Echo from everywhere, you could put a bunch of Echo Dots in different rooms. It looks like it's probably missing the audio quality speakers, I'd guess?


I love my echo! I probably use it 15-25 times a day. 1) Acts as my alarm 2) Turn on my favorite radio station while I make breakfast. 3) Timers for cooking breakfast. 4) Listen to flash news 5) Alarm again if I need a nap. 6) Timers for lunch meal 7) Add item to shopping list. 8) Add todo items. 9) Plays spotify while I work on my computer from across the room. 10) More flash news (its really quite extensive) 11) more naps 12) dinner timer 13) news 14) word definitions 15) Tell it to stop when it starts talking in the middle of a conversation (a bit annoying). 16) more todos 17) Order more dogs treats 18) Play bedtime music Worth every penny. Where did the strange sense of "everyone is spying on you" come from? A bloated sense of self importance?


Wow, you need a reality check here with your "A bloated sense of self importance?". Here is some history for you....

My Dad has written a book about Native Americans in the pacific northwest. Part of his research turned up personnel letters from an officer in the US Cavalry long ago (many officers). These letters were very personnel, and only ever meant to be read by his wife. Unfortunately, these letters were passed down in the family many times up until recently a family member got fed up with this box of letters and donated it to the University of Washington where my Dad found the letters relevant to his research, and others of personal nature as I explained.

You can't even begin to imagine what devices (production, backup, test, hacked versions, amazon, nsa etc) that your voice is sitting on now and what those devices and interfaces will look like 100 years from now and who or what will be using them, heck, even 10 years from now is a mystery.

So don't become famous, run for office or try to be big corp CEO or even use any social network because one day something you said while your echo was recording will bite you or your grandkids in the ass!

I would love to use a service like echo, it looks slick, but if I cant verify the source code or trust some community who has then it will never be in my house.


> So don't become famous, run for office or try to be big corp CEO

I don't necessarily disagree, but the vast majority of people will do none of these things mentioned.

The real issue is that of a person's private life seeping into all of their interactions with society. A person could be easily controlled even in private settings if a misstep could land them without a job, ruin a marriage, or cost a person their freedom.

With that being said, the majority of people don't care about privacy. Almost all of us are oversharing (although the demographic on HN are likely more privacy-conscious than most). Either we're all going to get bitten in the ass, or somehow we'll adapt as a society to accept others more deeply (as the alternative is mutual destruction).

I'm quite privacy conscious myself, but when does our habits of privacy-first make us bigger targets than others who are not?


>>Where did the strange sense of "everyone is spying on you" come from? A bloated sense of self importance?

You under estimate how far this can go!

Essentially, this could work against you in a million ways not even imaginable now.

Come to think of it, what if you are denied health insurance on grounds that this gadget was eavesdropping on your health conditions. Or some marketing company spamming you with ads on topics you talk about frequently at home. Or listening to the intimate moments between you and your partner. One could list a gazillion conditions in which a evil mind could use this to their advantage.


Exactly. And this is why surveillance is so difficult to fight. The actions are far removed from the consequences, and the public just isn't very good at long-term planning. But we can't wait for the negative effects to show, because by then it'll be too late.


Where did the strange sense of "everyone is spying on you" come from? A bloated sense of self importance?

Beyond what others have written, there's also a matter of principle. I don't think I'll be personally and directly affected by it, and so I'm kind of sloppy with my use of online services, but I don't want to live in a world where these devices are everywhere, because they are dangerous when ubiquitous and hence unavoidable.

The phrase "vote with your dollars/euros/etc" may be often misapplied, but there's some truth to it, and the corollary is that every time you buy something, you're also making a wider impact on society regarding what is acceptable.


I would not be surprised if it turns out that Alexa is the biggest thing they've ever done, including AWS.


Zero chance of that.

AWS will likely be a $100-$150 billion market value business in five years, with $6-$8 billion in operating income. They're tracking to $3.x billion in operating income in the next four quarters. It'll be valued as highly as Intel and Oracle.

A device that tells you the weather, orders an Uber, or orders more low margin merchandise off of Amazon, is not going to generate that kind of massive financial return. You can look at every lucrative business Echo could touch, and there's no scenario under which it could extract a large amount of monetary value. Ads? Not a chance. Sales referrals? No, the high margin stuff people want to visually browse for. Services? It could be 50 times larger than Angie's List and still not match AWS. Ordering Ubers? Ordering food? Ordering movie tickets? Relatively small sales, small percentage cut businesses.


No wireless. Less space than a Nomad. Lame.


I agree with you; however I believe the value of Alexa's value will be that it learns about you over time, making Amazon's services more "sticky" to the end-user, and making Amazon a more valuable marketplace to suppliers.

For instance, in the UK, Amazon has very recently partnered with an actual supermarket to sell some groceries[1]. If agents really catch on, what would a business pay to be the default milk provider when someone or their fridge says "I need more milk"?

[1] http://www.bbc.co.uk/news/business-35684829


You need to think outside the box. I'm sure you are completely wrong.


Alexa requires AWS to run. It's a symbiotic relationship, and Alexa is the consumer-facing AI extension. They've marketed it better than Watson and Siri so far, giving it new hardware to live in and opening it up to developers. But without AWS, there is no Alexa.


> They've marketed it better than Watson and Siri

It's hard to do worse than a platform that simply does not let third-parties in (Siri). I honestly don't understand why Apple is so opposed to the concept.


>they've marketed it better than Siri

really?


Yeah, their super bowl commercial with Alec Baldwin was a huge hit. And everyone who uses amazon sees the thing plastered across the front page every time they use the website. They're also much more active on social media, and have (presumably) paid to get #AmazonEcho trending now on Twitter.


Well, marketed to who is the question. To consumers I think Siri is advertised better (certainly, the vast majority of my non-tech friends know about it, Alexa not so much) but to developers Alexa is, well, actually open, so the advertising writes itself.


While voice assistants are likely to become a big industry, I don't see such a limited solution as Alexa stealing any large spotlight. They are too heavily locked down to be able to gain any traction in the big picture.


They are connecting more and more services, you can train your own "skills" and i am pretty sure they will open the platform up at some point. Could be huge. Remember, the first iPhone was totally locked down as well.


The iPhone is still locked down, and it still sucks.


If by "locked down" you mean able to develop and deploy your own software onto[1], then sure.

Given what's going on with the FBI, I'm starting to see the absurd security of the platform as more a positive than a negative.

[1]: http://www.pcworld.com/article/2933052/apple-frees-casual-io...


> If by "locked down" you mean able to develop and deploy your own software onto[1], then sure.

That doesn't allow publishing anywhere, which is still behind a paywall and still tightly controlled. It also still requires you to use a Mac, from what I can tell.

> Given what's going on with the FBI, I'm starting to see the absurd security of the platform as more a positive than a negative.

Oh yes, the wonderful mixup between "security" and security. "Security" is just DRM and Tivoization by another name. Actual security would mean a device that doesn't come with horrible RCE vulnerabilities out of the box, which Apple doesn't exactly have a stellar reputation for, as well as allowing the user to choose things like what data applications have access to. The two have absolutely nothing in common.


That doesn't allow publishing anywhere

Which isn't the point the point is you can run what you want on your own device now regardless of whether Apple likes it or not.

The tight control is a feature, not a bug. I'd rather put up with this slight nuisance than have the adware, malware infested dump that is the Play Store. Random hackers and advertisers are a much more clear and present threat than anything Apple can do to me.

I've had less and less reasons to jailbreak over the past few releases, and with good reason, a jailbreak both lessens your security and functions as an exploit all on its own.


>> That doesn't allow publishing anywhere

> Which isn't the point the point is you can run what you want on your own device now regardless of whether Apple likes it or not.

So they've lightened a tiny bit for PR purposes, while still not giving the average user any practical freedom.

> The tight control is a feature, not a bug. I'd rather put up with this slight nuisance than have the adware, malware infested dump that is the Play Store. Random hackers and advertisers are a much more clear and present threat than anything Apple can do to me.

The Play Store is by no means perfect, but it's never been that awful. And besides, I'm fine with Apple exercising reasonable control over their own App Store, as long as sideloading is reasonably simple.

> I've had less and less reasons to jailbreak over the past few releases, and with good reason, a jailbreak both lessens your security and functions as an exploit all on its own.

The exploit is there whether you use it or not, the only difference is whether it's you or malware authors who gain anything from it.


My point is that the first iphone did not have any 3rd party apps at all and id argue it did change the world of computing in the years after that. If you do not acknowledge that you are blind of hate for apple.


Still too expensive, imo. I've read a lot about "Alexa" and Echo... and beside the privacy issues, in many cases the Echo quickly becomes an expensive speaker (after the kids and everyone else gets tired of asking "Alexa" questions).

$89 is not in my compulsion buy price range. I may be in the minority on that though...


I sorta agree. I got the original Echo for $100 when they had a special deal for Prime members. The timer is handy. The shopping list is handy. It's occasionally vaguely useful to ask it questions about the weather or other things--though it's not like my phone is that far away. I do use it for Amazon Music when I can't be bothered finding something to play on my stereo.

Potentially, the ability to interface with home automation devices will make it more useful but I'm honestly not sure how much of that stuff I will ever use.

I'm happy enough that I bought it but I probably wouldn't buy more to put in other rooms.

[Edit: I think if I lived in a small place and didn't have another music source I'd find it more generally useful.]


I have a $4 dollar multi-function timer that we use in the kitchen and for "turns" on the trampoline in the backyard. I still don't use all the different features available on that silly thing. My GE gas range also has a timer, same with the microwave that sits above it. I bet my fridge has a timer too... lol. I have a literal crap-ton of devices that I barely use to their full extent. There just aren't enough hours in the day.

I still use pen and paper to keep notes/lists. I actually have a Bullet Journal... so maybe I am not the target demo.


The thing is, you can ask Cortana or Google Now or Siri about the weather, shopping (at least Cortana has reminders, not sure about the other two) and the other things you mention.

So, at the moment it's feels like a redundant device that one has to purchase in order to do the same thing I can do with the phone and/or computer that I already own.

I mean, I find it kind of cool (except for the creepy "I'm listening to what you say", which applies to all assistants anyway), but I find it hard to find its place in the world as things are right now. Perhaps is that I do live in a small studio and my laptop or phone or tablet are always at hand.


I don't really disagree which is why I'm pretty ambivalent. The original Echo does have a decent Bluetooth speaker and I find the voice recognition a lot better than Siri's. And it's available in my kitchen/dining area when I need to add something to a shopping list or ask a question with greasy hands. But I certainly wouldn't try to convince someone that it's a "must have."


The single greatest feature of Echo that I use, (and too few others use) is turning on my Phillips Hue lights. You can give each individual light a name, as well as groups of lights their own name. Very convenient to turn on / off lights from anywhere in range of the mic.

I use it everyday multiple times a day. If anyone finds network connected lights useful, then they'll find controlling them through Echo doubly useful as it obviates the need to pick up your smartphone.


Not to pick on you :-), but I have these things called light switches in my house that work pretty well for turning on and off lights. I confess to not seeing much attraction to smart lightbulbs with names.

(To be fair, if I had a lot of lights not connected to switches as was the case when I moved into my current house, I'd probably have put in smart lightbulbs rather than doing as much rewiring as I did.)


Light switches are great, but there is something really nice about laying in bed and dimming the lights by voice.

Or right before going to sleep, you remember you left the living room light on, so you say, "Alexa, turn off the living room light", and watch as the glow under the door disappears.

It's a luxury, but it's a lot of fun if you are lazy.

I also use the Echo to adjust my nest thermostat, control my entire home theater (with 6 different devices), and control my tempurpedic adjustable bed and even remote start my car.

I find the possibilities of voice control and home automation to be intoxicating, and hacking around with the Echo is sort of one of my hobbies right now.


How do you use it to control your home theatre?


Alexa has come in handy when I forget to turn my computer desk's light off -- it's connected to a power strip -- so I can just yell downstairs and have her switch it off for me.

I can also set individual lights or groups of lights to turn on during certain events, like returning home and it's after sunset and the lights aren't already on, but that's more of a whole-home-automation thing rather than Alexa. :)


They can try to take light switches from my cold, dead hands!

Light bulbs are literally the last thing on earth that I want to research for security exploits before I purchase.


Some of the Skills that have been added lately are also useful, Alexa can read you Kindle books, for example, which is soothingly robotic.


It's painful to have it read back answers. The weather, for example, takes one quick glance on my phone but forever for Alexa to read.


i was thinking this too. i'd like to also have a screen, for example on my fridge, where echo can post visual data replies.


Yeah, almost the whole page for the Amazon Tap is just listing its musical abilities. And the price comparison chart at the end is to other bluetooth speakers.


Echo Dot ($89.99) is available exclusively for Prime Members through Alexa Voice Shopping. To order your Echo Dot, use your Echo or Fire TV and just ask: “Alexa, order Echo Dot.”


I noticed that, so I assume they're targeting this as an accessory for existing Echo owners (additional rooms, etc).

But here's what confuses me: The Dot SEEMS like it would work extremely well without owning an Echo, the two don't seem to integrate together, the Dot just uses an external speaker instead of an internal one.

So maybe this is just a promo available to Echo owners and everyone will be able to buy it later for a higher price? But the page could be clearer about what their intentions are and how they justify the Echo-only buying option.


The Dot does have an internal speaker - just a tiny crap one.


I nearly renewed my Prime membership on reading the first part but then stopped when I read the second part. Holy market segmentation, batman - why actively repel new customers and only make this available to those mini-me-philes who already have an Echo?



Ah, thank you. You have likely won Amazon a Prime renewal :). EDIT: gosh, didn't even require Prime membership to order - thanks again.


Man... I had my audrey doing this in the '90s. I can't believe I missed the boat and somebody else is making a bajillion dollars. It's time to search through the archives of all the cool stuff we did 20 years ago and put it in a shiny new wrapper.


Sometimes I think about this. The old cool stuff not only can have a shiny new wrapper but a whole new set of modern technology that can finally turn them into a successful "new" product.


x-10 FTW!


I still have a couple of X10 lights controlled wirelessly in my house. When I moved in, many of the lights in the house weren't wired to switches but just had pull chains. Over time, most of the house has been rewired and switches added but I still have a couple of lights that haven't been connected to switches and I still use X10 for them.


Years ago... mid to late 90s... I had everything X-10'ed up in my house. What I didn't know is that my house alarm was also X-10 capable.

One night, I couldn't enter the code in time and you could see what house was alarming for miles around. Inside and outside the house, everything that could blink was blinking - to go along with the blaring sound.

I cancelled the alarm service (and kept the alarm) soon after because 1) it would freak anybody breaking into my house out, and 2) if the cops couldn't figure out that there was something going on at my house without somebody having to call them, they just weren't doing their job.


"To order your Echo Dot, use your Amazon Echo or Amazon Fire TV and just ask..." An Expensive marketing campaign to sell Echo and Fire TV?


Maybe a bit more, as an echo user I've never ordered anything from my echo... however once I do, i'll probably do it again. The dot could be a gateway drug :D


Hmm. The Dot might be a good addition, but it's too expensive. I want to put several mic & speaker combos around my house, but I don't want to pay $90 per room. Something in the $25-$40 range would do much better, even if it was a simple relay to the main Echo.


My FireTV is also upgraded to Alexa silently recently and it's fun to play with.

Is it possible for me to upload my own content, say an audio book, some music I own etc so I can use Alexa as a voice command to fetch my own data too? be it on the cloud or my local NAS/DLNA box.


The short answer is "NO": https://forums.developer.amazon.com/forums/thread.jspa?threa...

Before it supports those features Alexa is more of a toy, I played with it for a few minutes then never used it again.


You can upload your own music into the amazon music library, and then play it via the echo: http://www.amazon.com/gp/help/customer/display.html?nodeId=2...

I assume an audio book could be uploaded as music, but I'm not sure if you can flag it as a book.


My dream is to have some kind of android TV box that performs this.


For voice activated solutions the hard part is the front end, i.e. voice recognition, which is what Alexa is strong at. Once this is covered, it's relatively easy to cover the rest. Really the core competency of Alexa is its excellent voice recognition performance, which is still hard to find elsewhere these days.


I'd love to see a nicely curated "voice recognition benchmark" to assess claims like this. Do you know of one?


https://www.quora.com/Speech-Recognition/Which-is-the-best-o...

There are a few open source alternative, Kaldi is new to me


This one looks promising too, although very early days. https://github.com/srvk/eesen

I think we'll see something state of the art (that runs on mobile devices pretty soon in the open source world.

You also need a good microphone setup to get good quality speech in.

Both Echo and Dot have a 7 mic array with beamforming, which helps a lot with far field speech recognition.


FireTV


My problem with Alexa is, I don't want a far field cloud based voice recognition device within my reach.

I'm fine with a device doing the voice recognition on premise/on device with the same functionality.


Versus a close field cloud based device in your pocket? I'd be more comfortable if knowing that Alexa is truly not listening unless the blue light is on.

But I hear your point. I think we're all getting lulled into just giving up on totally ruling out our most paranoid considerations. Not that it isn't quite rational to be paranoid given the constant barrage of proof of device exploitation and mass surveillance.


Classic hub/spoke model

Echo = hub, too expensive and large to buy 10 for every room in the house, used for receiving, processing, routing info from spokes and cloud

Echo dot = spoke, microphone and AI functionality at a lower price point, distributes connectivity network throughout the entire house so that you don't have to walk from your kitchen to your living room to order new paper towels from Amazon


not hub/spoke at all. the dot and the echo are separate standalone products. The only reason you're required to own an Echo to get a Dot is because it's a "limited supply" product, and they want to make sure that only loyal customers get to review the new thing in order to seed some positive reviews.


Plus Amazon Tap, so now you have a lot more choices (or confusion, depending on your PoV)


I want this technology, but I don't want to send this info to Amazon. Guess I have to continue on my own half-assed implementation.


How are these 2 "new" products different from the normal echo?


One does not feature a good speaker and is intended to connect to your existing speaker system. The other is portable.


What is the difference between Echo, Tap and Dot? It is confusing me a bit.

Dot: has no speakers? Requires bluetooth based pairing. Requires an Echo to work?

Tap: has wireless speakers with a built in battery. Also seems to have a Mic. Do I need an Echo to make this work? Can the tap work with the dot?


I don't own one of these devices, yet I'm curious, can you "modify the device's name"? I mean, what if someone in the household has the name Alexa. No, not you Alexa, the other Alexa. Alexa do your homework. Alexa take out the garbage.


Yes, by default you can call it Amazon instead. I have this problem because I have a cousin named Alexis and my Echo gets confused when she comes over. But I still keep it at Alexa.


Good to know, thanks.


when I saw Amazon 'Tap' I was hoping to see a star trek communicator[0]

[0] https://en.wikipedia.org/wiki/Communicator_(Star_Trek)


this should already be possible for a 3rd party to build with Alexa Voice Service. would be really neat to see.

https://developer.amazon.com/public/solutions/alexa/alexa-vo...


Any reason why Echoes couldn't communicate with other Echoes? My friend and I own Echoes. I could say "Alexa, call Joe" Joe and I could talk to each other through the Echoes over the internet.


Or even "Alexa, eavesdrop on Joe".


What's stopping people from accidentally ordering things with this thing? Could I go into somebody's house that has one of these devices setup and say "Alexa, order some breast clamps" ?


Yes. Yes you can. Hilarity ensues.

The "security" is the you have to trust the people you invite into your house.

It's not great security. :)

(ok, that's not entirely true. as far as I can tell, you can only reorder things they've already bought via the website)


Make a webpage that automatically starts playing an audio clip of someone saying those things, trick people into visiting the page and hope their laptop isn't muted.


You can add a PIN code to the voice purchasing process.

http://www.howtogeek.com/237386/how-to-enable-disable-and-pi...


Any reason Echoes couldn't communicate with each other? I envision my friend and I own an Echo. I could say "Alexa, call Joe". We could talk to each other through the Echo over the internet.


This is a better product than the original. They added one of the most requested features (audio out) and didn't remove anything important (unless you don't have a better plug-in speaker system).

The biggest oversight is now the fact that it can't work together with an existing Echo: Amazon is making us order these _using_ an Echo... but the two devices don't communicate at all and require individual wake words. I wanted this as an added mic for my existing system, not as a new independent system.

Big step in the right direction though.


Connecting them will be as simple as a future software update. Amazon's challenge right now is to get the hardware in place before its competitors (Nest, Apple) - and it seems to be pressing hard to get a wide range of devices in every room of the house.


I agree with you on this, but generally they'd have an easier job of it if they worked as a connected mesh rather than independent controllers. I can't say for certain but it doesn't seem like the added engineering time would be that much greater.


Can anyone else order this through their Fire TV? I'm just getting "Your search did not match anything in our catalog."

I could also be doing this wrong as I literally unboxed my fire tv just for this. I'm using the companion iOS app to access the microphone, but selected the phrase on the Fire TV.

The voice rec also sucked. I had to say the damn sentence like 9 times in an unnatural way. I hope that's not indicative of this experience I'm wanting to order...


> Echo Dot ($89.99) is available exclusively for Prime Members through Alexa Voice Shopping.

Huh? Why would they prevent new customers from ordering this?


I presume because they've had trouble keeping the original Echo in stock. And they want Prime members more than Echo Dot owners.


It's not just Prime, you need one of the other Alexa products to order it.


>> you need one of the other Alexa products to order it.

It likely that DOT uses the Echo as a parent device to do all processing of things like sending the voice requests to the servers and back and the DOT just works as a microphone slave device with some basic synchronization to determine which device heard the wakeword first. If DOT hears it first it sends the voice data to Echo and Echo does it's normal thing and sends the response back to DOT. (At least that was the intent when I worked on early versions of the project.)


The product page doesn't mention Echo as a requirement, AFAICT.


As usual, all goodies are US only :(

I want Alexa for my home automation, and I don't mind speaking English to her. But tough luck in Switzerland.


In one example: "Alexa, adjust my home thermostat to 74 degrees"

It would kill some people if used here in Europe (because we would rather adjust our thermostats in radians).

More seriously, is there any protections against dangerous orders? (eg Your kid ordering 42 tons of sweets on Amazon)

"I'm sorry Dave, I'm afraid I can't do that"


I would buy the Tap if it was always on listening while on the cradle but then push button while portable. Doesn't look like it works that way from the description. I get that it takes too much battery to have 7 always listening microphones on, but while on the cradle, this should be a non-issue.


"If you have more than one Echo or Echo Dot, you can set a different wake word for each—you can pick "Amazon", "Alexa" or "Echo" as the wake word."

So they haven't solved the I have multiple Echos in my house problem yet..


They raised your options by 50%! (It used to just be "Amazon" or "Alexa").

But in seriousness, I have no idea why thy haven't enabled at least a larger list of wake words.


I'd love to read a few stories about how people use their Alexas in meaningful ways.


The dot sounds great but I cringed when I read about the tap. Its increasingly common for people to play cell phone audio in enclosed places without consideration of others. The tap seems to be designed to make it even easier to do so.


You mean like ALL portable bluetooth speakers?


Note: this refers to the "Alexa" voice assistant, not Alexa the domain ranking company (also owned by amazon)

http://i.imgur.com/B6dsMNm.png


I absolutely adore my Echo. But I live in a small apartment, so I really don't see a need to buy a Dot as a second Echo device, even if the size and price make it a more attractive option than the original Echo.


Actually, I think this is very useful for two things: - Set timer for cooking - Listen music

I’m sceptical about getting other skills. "Alexa ask MyApp to do something”… it’s very long and annoying

But I strongly believe they will improve that.


I'll always be a bit bitter toward the Echo project. I had a really great manager transfer to that project when I worked at Amazon. It's part of the reason I left. Glad to see them do well though.


I'm wondering why it took the product being from Amazon for geeks to finally be ok with a device that silently listens to everything you say in your home and sends that data to Amazon's servers.


This is FUD, it doesn't do this. Only data recorded after you say "Alexa" or the like is sent up. There is also a mute button.


It's not FUD. We've had countless stories scare mongering about smart TVs that "may be spying on you"

If this device was from Samsung I think the giddiness over it would be alot more tempered.


I think the number 1 use for voice control is the car. The current (Apple Car/Android Auto) are good but I would be interested in a better experience. Would like for Amazon Alexa to work in auto.


If Amazon gains by providing this service to prime members then why don't they have a voice control app for iOS/Android to connect with Alexa? (not just the setup Alexa app)


Would be awesome if I could connect and control my Sonos from it.



You're a true hero for sharing this.


This solves a huge pain point I have with my Alexa. That being said, it will still understand any man's voice in my home better than my own. Decisions decisions.


This, a sort of "extender", is what I've been wishing for since the original Echo came out. Ordering tonight (since I can't do it from work).


Are there any open source projects trying to emulate the cloud-based voice recognition that Amazon/Google/etc are doing for Alexa/OK Google?


Kaldi. "Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0."

http://kaldi.sourceforge.net


TensorFlow, open sourced by Google, would allow you to implement the latest in voice recognition research relatively quickly. Recently they released TensorFlow Serving which lets you run your tensorflow models on the cloud.


Amazon is offering a free t-hirt to people adding new Alexa skills... that's the bar for adding to their ecosystem now -- a t-shirt.


I like this idea very much. By making it cheaper and smaller, Echo can easily become the ears of any electronic in the house.


"To order your Echo Dot, use your Amazon Echo or Amazon Fire TV and just ask: Alexa, order an Echo Dot."


How is this any more invasive than what FB/Google knows about us?


Device with a microphone always on from a an NSA affiliate company.

Interesting.


Why don't Amazon have an Alexa iOS/Android app?


Well, Evi is still available on the Play store: https://play.google.com/store/apps/details?id=com.trueknowle...



No, when you launch the app you have to put in your Alexa ID. You need at least an Amazon Fire TV to get the ID.


"This app is incompatible with all of your devices."

Looks like it only works with "Alexa" devices


They do.


Everything a (smart) phone should be able to do, no?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: