It's a lot of manual work, but using OpenCV saves you a lot of time. I can't share the code unfortunately, but what i did was this:
* Get all Instagram photos with the '#selfie' tag
* Run it all through the haarcascade_frontalface_alt2 OpenCV cascade, i used the 1.3 and 5 values for the detectMultiScale() method.
* Check that there's only one face in the image, and make sure it's larger than 20% of the width of the image.
Even after that i still needed to go manually through the images. I guess around 10% was still false positives.
It's a lot of manual work, but using OpenCV saves you a lot of time. I can't share the code unfortunately, but what i did was this:
* Get all Instagram photos with the '#selfie' tag
* Run it all through the haarcascade_frontalface_alt2 OpenCV cascade, i used the 1.3 and 5 values for the detectMultiScale() method.
* Check that there's only one face in the image, and make sure it's larger than 20% of the width of the image.
Even after that i still needed to go manually through the images. I guess around 10% was still false positives.