minimaxir/hacker-news-undocumented

BigQuery dataset no longer updated

jstrieb opened this issue · 1 comments

Hey @minimaxir! In this repo, you mention:

If you want to gather large amount of Hacker News data for data analysis/machine learning, you should use the Hacker News dataset on BigQuery, which is updated daily and is much more pragmatic to use than manually scraping data from the Hacker News API.

Also, in minimaxir/get-all-hacker-news-submissions-comments, you recommend:

UPDATE August 7th, 2017: All Hacker News submissions are now available on BigQuery, and the dataset is updated daily. If you are scraping Hacker News data at scale, it may be more efficient to use BigQuery instead.

Just wanted to let you know that it seems the dataset hasn't been updated on BigQuery since November, 2022. There is a Google issue in their tracker for this, but it doesn't seem like there has been any action (and no ETA) on that issue in the several months since it was forwarded to the BigQuery team. Not sure if you want to update your repos to reflect this, but I figured I'd let you know.

I noticed because I made a Firefox extension to show HN posts for the current page using Bloom filters built from BigQuery HN data. Now that the data is not updated on BigQuery, my repo has out-of-date Bloom filters, and I'm in search of a sustainable alternative...

Added.