It was so confusing figuring out what this service is supposed to do. Had to look up the documentation. In summary, from what I can gather
1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
Given this service was for developers with an interest in NLP, it would have been good if they didn’t hide behind a snow job title like “Siri as a service”.
> 1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
Currently most Wit users use Google or Nuance with great success. You can even use Android's offline speech rec.
That being said, CMU and OpenEars work well, as long as you provide them with good language models (which you can't do if you hack a quick project). Our plan is for Wit to automatically generate the right language models from your instance configuration.
> 2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
We abstract the full NLP stack for the developer. How we do it is not really what matters to our developers, as long as it works :) Actually we use a combination of many different NLP and machine learning techniques.
> 4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
You don't need to train hundreds of examples. Plus, our users are not NLP/ML experts and they prefer a graphical UI. But that's true it could be still more efficient, we have good features in the roadmap for that :)
Alchemy is great as a set of NLP tools, some of them quite academical, but it's not designed from scratch to solve the problem we're trying to solve: enable the masses of developers to easily add a natural language interface to their app.
I was initially thinking that this is just a SaaS2SaaS Wrapper around Google/Apple Speech recognition. Hey, tbh. wouldn't that be a clever way to market? hehe, that reminded me to SiriProxy.
But to give the authors their credit back, it's not what I guessed. It's much more a GUI around a complex toolset that would require you to dig deep into mudwater, bad docs etc. So, yes it makes life easier and sense to use this in your app. I have not evaluated the quality of their service yet, but it's a Startup, it's not going to stop improving (hopefully) :)
(Google/Apple are essentially powered by "Nuance", but with different qualities of training-data.)
More NLP/AI Startups and more colloboration on HN please!
2cents: I hope people don't sell to their first working protoype to the Google/Apple/Microsoft Empire, but try to get big on with friendly startup-colloboration and with the help of investors/angels.
Maybe I'm out of line, but if you're planning on tearing into what's wrong with something, try to offer something positive as well. Your feedback was great and constructive, it's just not very nice.
I'll be honest, HN has a tendency (myself included) to have a first natural reaction of "how can I criticize this?" But just because something isn't faster than enterprise, or not-as-scalable, or not made in your framework of choice doesn't mean it's worthless. I think this project is amazing. Great job and I can't wait to see this mature.
Is there any way to view this page with the effects turned off? With all the text constantly appearing and disappearing, I haven't yet made it to the end of a sentence, and therefore can't form an opinion about it.
I think there was a picture of a robot on the screen for a few seconds, but that's all I remember.
Better, but it still seems to have lots of things happening on timers. So I still have things I'm trying to read disappearing out from under me.
I imagine as the developer you don't notice it. But as somebody trying to read a page, it's really jarring to have that happen. Enough so that I give up trying because I just want it to stop doing that to my eyes.
Any chance you could turn it off completely and just put some arrow icons on there?
Same here. I find "siri as a service" an interesting project. But not interesting enough to cope with a page that blends in and out content and makes my head spin.
You should make it clearer that you don't actually handle voice recognition. When I read: "Developers use Wit to easily build a voice interface for their app." I expect you to handle things from start to finish.
Also, let me try it! It's frustrating because the UI looks like you can experiment but it's only an animated demo (or am I missing something??) In particular the mic logo is used to record on Google and here it doesn't seem to do anything?
> You should make it clearer that you don't actually handle voice recognition.
You're right, we'll make it more clear on the landing page. A full out-of-the-box integration with some voice recognition engines (we love CMU Sphinx, open source) is in our roadmap.
> Also, let me try it!
We purposely didn't provide a "end-user" demo (something that would look like chatting with Siri) because we want to focus first on the developer experience, when they configure Wit to understand their very own end-users intents. You can require an invite and try this in less than 5 minutes.
Requiring an invite seems like such a high barrier to entry if you actually want people to sign up. I usually skip any such thing, because instant gratification is great, and who knows when the invite will come in and whether I'll still care once it does.
Seeing this message, I bit the bullet and requested an invite anyway, and have seen no action in the couple of hours since... thus validating my initial reluctance.
But as a bootstrapped startup, we have to make tradeoffs as our budget is limited. We have to accept invitations gradually today to keep our servers alive. We should be able to accept everybody within a few days at most. Sorry for the inconvenience.
I hadn't thought of the load. If it's there because you need to rate-limit users, then I completely understand. Would be ideal to take everyone instantly, but the world isn't always ideal.
The word "Siri" doesn't belong in the title or the article, unless a Trabant advertisement has the right to mention Mercedes-Benz in its promotional text. The project does a primitive kind of voice recognition, but it doesn't use Siri.
On this topic, I invite people to try out my non-prototype, non-project toy that uses Google's support for HTML5 speech recognition. It's pretty funny how wrong things go when you try to say something even a bit out of the ordinary:
If I say, "Now is the time for all good men to come to the aid of their country," an old teletype test sentence, the Google recognizer always nails it. If I say, "I hit an uncharted rock and my boat is being repaired," things go hilariously wrong, and every time differently.
I think in the future, when computers are 100 times more intelligent than they are, we'll laugh at these examples. But no one should doubt the difficulty of interpreting continuous speech without prior training for a given speaker. It's no wonder that speech interpretation on telephones tend to be limited to understanding a handful of possible responses: "Yes", "No", "Let me speak to a human!"
I like the concept a lot. I'm going to have to read more about it. One thing that I'm unclear about is if this does voice->text, or if the developer does that and Wit handles translation of that into actions.
Just a heads up, but Get Started on the pricing page does nothing. It's natural progression for me to go home page->pricing->OK, looks good, let's get started.
Wit takes the output of the voice recognition engine as input. It's quite robust to voice recognition errors. Most devs use Google's engine or the open source CMU Sphinx engine.
This is amazingly timely for me, I've been building my own version of jarvis using speakeasy-nlp (a node NLP library) and Chrome's builtin support for HTML5 webkitSpeechRecognition:
Hey everyone. Wit guy here. We've been working on Wit the past few months and we think it's time to get your feedback. I'm happy to answer any questions you have.
Bringing Natural Language Understanding to the masses of developers is hard and we still have a lot of work ahead of us. Please don't hesitate to reach out to us!
Nice concept, I just came back on here to let you know that I don't know what is happening on that page but I left it open about 45 minutes ago and noticed my fan kicked in a lot. It was that page I left open. Ended up taking 25% CPU, you didn't work on the iTunes software did you??
Interesting! Nothing happened after I registered with my Github account though using Opera.
I also wonder how this compares to http://www.maluuba.com/ ?
Wit is 100% open and flexible, you can create any intent you need for your app, you're not limited to a static set of domains/actions.
EDIT: @ragebol we are very interested in ROS and robotics, don't hesitate to get in touch with me arthur at wit dot ai. In the future we would like to provide an off-the-shelf human/robot communication module for developers.
For the robot I'm working on, we're using http://wiki.ros.org/wire to make probabilistic world models.
When the API would not only return a confidence score for one possible interpretation, it would be interesting to get multiple interpretations with varying confidence scores, so you can somehow determine one that makes the most sense given your world model.
Awesome! I made a wrapper around Maluuba for ROS before (https://github.com/yol/maluuba_ros) and maybe I get to a Wit-wrapper for ROS as well.
With Maluuba, we can't make a command like "Introduce yourself" or "Grab that can for me" because of the limited set of categories. Wit should be able to handle those as well, from the looks of it.
We (Maluuba) are actually working on improving nAPI and now have the ability to define your own domains and actions. It's not public yet, but we're planning on releasing it mid November.
How "open" is Wit? Open source or open APIs allowing users to modify your algorithms/training data? I assume there would be a significant cost to being able to run our own instance of Wit on-premise, right?
I applied for alpha access using the github username "marks"
Openness is one of our core values.
We're inspired by companies like GitHub.
- Regarding data, we encourage users to share their data, making “public” Wit instances free of charge (à la GitHub). We’re also working on standard formats for NLP/ML data (models, sets, etc).
- Regarding open-source, we plan to release our algorithms and infrastructure piecemeal (à la Prismatic). We’ll announce our open source plans in the near future.
Your message isn't clear. AFAIK there is no official way to interact with Siri or Google voice rec.
It seems like WIT will take the text that has already been translated from a user's voice to text and make it easily accessible to my application but how does WIT access the text generated from a Siri request in the first place for example? Does WIT have some other way of getting at this data that has already been converted from voice to text by Siri or Google or some other speech-to-text engine?
> AFAIK there is no official way to interact with Siri or Google voice rec.
Actually there are ways. On Android devices, voice rec is available to devs (even offline if the user enabled it!). We have a simple tutorial about how to integrate on Android https://wit.ai/docs/android-tutorial
Right now on iOS you have two options (none of them involves Siri, which is kept closed by Apple):
1/ Do the voice rec server-side (Siri does that)
2/ Use OpenEars to do it client-side
Server-side, you have many voice rec options, including open source CMU Sphinx.
Providing a fully-integrated solution with speech rec out of the box is in our roadmap.
Looks like you focus on search, summary, entity and sentiment extraction with a rule-based approach.
Wit's focus is to power human/machine interfaces, and our priority is to provide developers with a 100% configurable solution, with no prior assumption on their domain. And we don't believe in rules, we chose a machine learning approach.
No, PleaseAPI is converting Natural Language to Database like structures/commands with no need to have a human come up with every way you could say something because the vocabulary is built in.
Unlike Wit it also offers the option to use the API's that are already integrated or Bring Your Own Backend so that you can have a mix of info/responses from your own system, or leverage what is already there.
It will be on Mashape shortly, the Natural Language Part of Speech Tagger just went live Friday. Documentation takes time after we write code and is less fun to write. :-)
This would be great for open source projects, but I feel like I would trip over a very large pile of patents if I tried to build a product around it. I don't have any relevant experience myself though, so it's just a feeling.
The pricing model doesn't scale realistically and would require a subscription service for users. An app with 1M+ installs could do 1M+ calls per day making this service $24k / month.
After reading http://octodex.github.com/faq.html, we thought it was okay to put this image given that we advertise and reference GitHub a lot, we heavily integrate with it and we love Octocat!
The landing page looks like a minefield of legal issues. Marketing everything explicitly with the references to Siri is asking for a law suit from Apple.
We share the same vision that voice becomes the key human/machine interface, especially for the upcoming generation of wearable devices, home automation, etc.
I don't know if Ask Ziggy is 100% self-service for the developers. That's a key requirement for us.
Not at this point. Feel free to sign up for the beta on our site and take it for a spin, I'll make sure to get you your credentials quickly.
PS: by the way, big fan of historio.us...
- Spanish (Mexican, Castellano, others?)
- Chinese (Mandarin, Cantonese)
- Hindi
These would be logical next steps with some important commonalities: broad base of native speakers, high importance in the US market (maybe less so for Hindi), and very important dialect differences. Mandarin, Cantonese, Hindi, and Russian could also force the issue of non-Romanized character sets.
Dutch... But I guess the odds are low for that, because the smaller user base.
There are some companies focusing on care robotics emerging in the Netherlands though, which could use a service like this.
It's obvious to most people (regardless of whether you think that free software is in important principle) that the techniques described in that section will result in you making MUCH LESS money, at least in the current context of these people selling their API.
Ignoring any open vs. closed source arguments, this kind of thing would work much better as library than some third party service. There's no way I'd make an application that depended on a third party service over the internet for basic UI interaction.
1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
This different from alchemy (or many others) because this is open source(?) http://www.alchemyapi.com/products/features/entity-extractio...
Given this service was for developers with an interest in NLP, it would have been good if they didn’t hide behind a snow job title like “Siri as a service”.