The reason for that is that people who have capabilities of doing that know that its not a good thing.
The data can be used for re identification attacks against user privacy.
Some good advice
1. use python + NLTK
2. read papers and posters from current conferecnce such as ACL or even WWW
3. start an incremental project and try at least getting a poster published in a tier-1 conference
4. If you have good grades + work ex go to CMU Advanced Lang. Tech Masters program
Based on your theory of genetic difference, this isn't necessarily sound advice. "White" and "Indian" are not distinct genetic groups . The genetic distance between a Swede and a Czech (both white) is bigger than the distance between a "white" southern european and someone from northern india.
From personal experience, let's just say dating an Indian girl as a white man is non-trivial. (Assuming Indian means from-India, not just Indian descent.)