Hacker News new | past | comments | ask | show | jobs | submit login

Sorry, I don't have time to see the video right now, but I am wondering if it isn't possible for you to use something like Hadoop/Hive/Presto to simply get a list of all users in Paris on demand.



Hive and Hadoop are offline -- it can take ~45 minutes to execute a query on our entire user table (even longer if it involves joins) and certain times of the day its slower (during work hours usually). Not only that, but once the query executes some engineer has to go copy and paste into a script that would likely run on one machine.

Doing this in a distributed async job fashion allowed for a lot more flexibility. Even better, we can even change the geographic area as the algorithm runs and those changes are reflected immediately.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: