Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe this is done to get answers for unsolved captchas. For example, I have a million photos of streets filled with cars, buses, motorcycles, streetlights, and crosswalks I want to add to my captcha database. I don't want to categorize them all myself, and I want the answers to be what the average person will identify, not what I or a machine will identify.

So, I send everyone two captchas. One has a known answer and is required to be correct to access the service. The second captcha answer isn't yet known, so it doesn't matter what the user selects. However, when they get the known answer right, we log their answer for the unknown captcha. Once we get a large enough sample, we then have our top answers for the unknown captcha and can start using it for verification.



I always assumed that's how it works so would do the first correctly and random clicks for the second. This is as I was uninterested though doubt it is still that simple.

I wonder, what are the minimum number of labels per image to ensure clean data?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: