Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi! I'm the one that loaded this dataset into BigQuery. Feel free to ask any questions :).

The notebook with sample queries and visualizations:

https://github.com/fhoffa/notebooks/blob/master/analyzing%20...



Great stuff! Looks like the whole dataset was imported 2 days back. If you can stream it daily, I can use table decorator and avoid processing all the records.


There are items in there that are deleted/[dead] on HN (and not very recent). How come?


You can see dead things on HN if you turn on "showdead" in your account settings.


What is the "comment_ranking" data you mention in the notebook?


Hacker News chose to hide comment scores some time ago, but I still wanted to find a way to rank comments. The good news is that the API gives you a "kids" column that ranks comments in the order they should be displayed - that's how I can find what's the top comment for each post (as shown in the linked notebook).


Is that column independent of "gravity"?


I guess it incorporates gravity - it's whichever way Hacker News tells its clients to rank the comments by.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: