Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool.

10M venues is a lot, and a really hard to believe number.

For example, Yelp lists ~13k restaurants in NYC (3.5k in Chicago, 4.5k in SF)

In the top 10 categories for venues, Yelp NYC has 40k VENUES (restaurants, shopping, food, health, spas, nightlife, and some more).

Either 4square has 250 cities like NYC, a fundamentally different definition of what a venue is, or Yelp is SERIOUSLY missing a lot of places (like on the order of getting only 10% of the venues in each city).

People have written about the data sparsity problem in the Yelp dataset as compared to Netflix (http://www.stanford.edu/class/cs229/proj2009/Fennell.pdf) for using CF techniques, I'm very interested to hear what people will think about the 4square implementation.

I'm skeptical, but I really hope it works, because I have very average food far too often on recommendations from friends with dissimilar palates...



Foursquare has way more places in their database, based on my usage of both Yelp's and Foursquare's iPhone apps.

Yelp is based on reviews of businesses. Whereas Foursquare is based checking into places. Therefore, Foursquare's domain is much larger.

I'm not sure if Foursquare or Yelp add places to their database or if it's 100% user-driven. In Yelp's case, it takes more effort to add a place becuase you need to (or at least you feel like you should) also write a review. Whereas for Foursquare, you can just add your home. Or "RainApocolypse 2011," of which I am two days away from becoming the mayor.


Reduced friction for adding places is a plausible reason for why 4square might have more venues, but what I was getting at is the order of magnitude of the difference.

10 MILLION places is a ton. Like I said above, thats like 250 NYC's. Even given 4square's reduced friction to add venues, the fact that they've probably launched in more cities, and inclusion of arbitrary venues ("Hey I checked in at the tree that's 10 paces west of my house!"),

I don't think 4square is lying. Now I'm just wondering how useful those venues are. (I am also pretending that Yelp has better than 10% coverage of venues that users care about). Incidentally, either way, you've got to think that 10mm venues has to wreck the data sparsity problem as well.

For reference, the venue count of 3 large american cities:

- NYC 47228 - SF 38656 - Chi 19079

I'd be surprised if Yelp had even 1.5mm venues. Would love if someone could corroborate/dispute this fact. While there is certainly a disparity between the two and Yelp, I'd imagine that the "useful" venues (those that people want recommendations for) aren't your home or rainapololypse2011. Adding those into the dataset actually makes 4squares job _harder_ with respect to teasing meaningful data out of the dataset.

If my phone ever recommends your house though, I just may come a knockin'.


Yelp adds places to their database, and they have had mixed success with Mechanical Turk.

http://engineeringblog.yelp.com/2011/02/towards-building-a-h...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: