Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've done basically the same thing as you for a project i did for a newspaper where i collected 9000 selfies (see http://vk.nl/selfies).

It's a lot of manual work, but using OpenCV saves you a lot of time. I can't share the code unfortunately, but what i did was this:

* Get all Instagram photos with the '#selfie' tag

* Run it all through the haarcascade_frontalface_alt2 OpenCV cascade, i used the 1.3 and 5 values for the detectMultiScale() method.

* Check that there's only one face in the image, and make sure it's larger than 20% of the width of the image.

Even after that i still needed to go manually through the images. I guess around 10% was still false positives.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: