Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been uploading the easiest photos I can find to the visual recognition demo[1], and its yet to get one right.

For example, I searched Google for "photo of girl", and found this image which seems very easy:

http://www.wagggsworld.org/shared/uploads/img/rachel-s-p-pho...

Watson says:

    Color		71%
    Human		67%
    Photo		65%
    Dog			59%
    Person		57%
    Placental_Mammal	56%
    Animal		50%
    Long_Jump		50%
Huh?

This isn't me cherry picking bad results; aside from their demos I'm not finding any photos that are accurately classified. I even tried a headshot of a person isolated on a white background, and Watson told me I uploaded a photo of "shoes".

Seriously - how is this data useful? What could I build with this level of accuracy?

Watson team - do you agree? Is this product about to get a lot better, soon, or is this considered "pretty good"?

[1] http://visual-recognition-demo.mybluemix.net/



The top 3 classes in your example are actually correct - it is a color photo of a human. But we expect it to get much better over time. Only real world usage will allow us to make real improvement - and that's why we are eager to release early.

We are also believe that the first applications (e.g., classifying animals or plants or landmarks in dedicated apps) will have narrower use case that give better accuracy.


The top 3 may be correct, but they aren't very useful. What could I do with this information? What feature could I build?

Also, the other results are very wrong. (i.e., Watson is more confident that this is a dog than a person. And I have no idea where it got "Long Jump" from). This makes it hard for me to trust Watson.

Is the recommendation that I incorporate a "confidence in Watson" metric, and ignore most of the results?

What confidence from Watson would you say indicates an answer that is probably accurate? And how confident are you that Watson's self-reported confidence is accurate?


I tend to disagree. Assuming they are correct on a larger corpus you can start doing things like "only do face matching on pictures with people in them" and weed out photos in a batch that don't have those three properties.

Watson is a training API rather than say the more fanciful emergent AI type API. More data, the better it gets. It is like Google's voice recognition isn't good because someone coded the magic constants for various accents, rather it is good because Google fed it millions of samples of spoken words and corrects it when they get it wrong.


Thanks for your comment. This makes sense - I would use Watson to determine which photos have humans at all, and then run those through, e.g., my facial recognition software. But Watson would keep me from having to waste resources looking for faces in photos of trees, for example.

I'm not in this field, so I'm having trouble understanding what use cases / consumer facing features this API unlocks. Your comment is very helpful in that regard.


It's actually very useful if it can detect with reasonable confidence that there is a person in a picture.

One example of a use is at Kiva we require borrowers to have a picture of themselves posted for their loan. But sometimes we get pictures of things like goats or cows instead (those are kind of nice to but gotta follow policy). Currently this is something we have to manually review for, but if we could automate that review piece it would save a lot of time (especially if at some point it could also count the number of humans in a photo).


Check out Clarifai. They have an image recognition API. It might be able to help detect people in the photo.


Why confidence that it is a human is higher than confidence that it is a placental mammal, and confidence that it is a placental mammal is higher than confidence that it is an animal? More specific descriptions must have less confidence.

Or Watson is not confident that humans are placental mammals and placental mammals are animals?


[deleted]


I just tried with a Taj Mahal picture and It works.

http://goo.gl/C8cLWp


I just tried a photo of the Kremlin, and got Cargo Ship (and ironically, Taj Mahal).

http://easycaptures.com/fs/uploaded/736/8308577082.jpg


The top 7 classes are correct: outdoor color photo of a landmark and historical site with vehicles in the front ;-)


The problem with AI systems has almost always been that they tend to be both right and wrong in ways that humans would never be.

Watson gives high confidence to it being a color photo of a human (which is a Person, and an Animal). Which is right. But the only part that a human would ever really care about is that there's another human in the picture.

It gets things wrong with a reasonable confidence for Dog, Placental_Mammal and Long_Jump...importantly, these are wrong in ways that humans would never get wrong.

Just as important are the omissions. A human would probably describe this as a picture of a girl or young woman, laughing or smiling, with curly brown hair wearing a scarf -- and maybe some other incidental information.

Of that description, Watson only got the superclass of one part correct (Human, Person) and didn't provide any of the other parts.

AI fundamentally "thinks" differently than a human, and that makes it hard for humans to use AI as a cognitive enhancement tool in the same way humans use calculators, books, writing, etc. We don't trust what an AI is doing or the answers it provides because for the information it provides, AIs tend to provide right-and-irrelevant, weirdly wrong, or omits obvious and necessary information that a human might use for informational purposes.

If humans ever encounter aliens, it's likely that their mode of thinking will be just as different. So bridging that gap, and figuring out how to make AI like this useful could be a useful endeavor.


One thing a machine learning system can do that any one human cannot do is ingest lots of data. For example, for some tasks in which I have tried to compare human vs machine speech recognition performance the machine actually does better because the machine may - for example - know a singer's name that an individual human may not recognize.


I gave it a picture of a cat (http://upload.wikimedia.org/wikipedia/commons/2/22/Turkish_V...) and got:

Photo 75% Shoes 69% Nature_Scene 69% Meat_Eater 63% Object 63% Mammal 63% Vertebrate 63% Cat 63% Indoors 62% Room 60% Person 58% Color 57% Judo 54% Person_View 53% Human 51% Leisure_Activity 50%

If you give the classifier a hint (animal) it gives: Meat_Eater 63% Mammal 63% Vertebrate 63% Cat 63%

So, clearly needs work as a general classifier, but still potentially useful.


Compare that to clarifai: http://i.imgur.com/BsWdpUA.jpg

portrait

youth

fashion

facial expression

women

european

girl

model

female

actress




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: