Here's some of the sources I've seen when working on that topic [1] : Census data, Phone books, Academic bibliographic databases (web of science, pubmed), list of athletes at olympic games, Wikipedia, LinkedIn, ...
Oh yeah ! Onomastics (the study of names) is very cool ! And a nice data point to add to personalized data. I did a project with that a while ago to study discrimination in France (https://namograph.antonomase.fr/) and it worked pretty well as long as you have large samples and stick to distributions comparisons.
Location: Berlin, Germany.
Remote: Yes
Willing to relocate: Why not
Technologies: People, Python
Résumé/CV: https://www.antonomase.fr/
Email: antoine dot mazieres at gmail dot com
Neuroscience inspired machine learning is what's happening now. Others approaches are logic-based and evolutionnary inspirations. A nice piece commenting on some of that: https://neurovenge.antonomase.fr/
Hey mazr, I created a new GitHub project over here - https://github.com/harigov/newsalyzer and a corresponding gitter chat group over here - https://gitter.im/newsalyzer. You can feel free to contact me through my id @ gmail so that I can start the conversation. I think your expertise could be of real help in building this.