A Ruby on Rails app that stores Hacker News items that have appeared on the front page, and exposes a few JSON API endpoints that let users search for terms, domains, and users to see how popular they have been on the HN front page over time.
Click here for a live dashboard that uses this API
HN only provides the exact list of front page items for dates since 11/11/2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:
- it excludes job posts before 11/11/2014 since they always have 1 point
- items with high scores don’t always get to the front page
- it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover
But it should be a decent approximation, and the code could also be modified to use other heuristics. It would also probably be an improvement to fetch all job posts from pre 11/11/14 via the HN API.
There are 3 files of interest:
app/lib/hn_client.rb
- code to collect front page data via the HN website and APIapp/models/hn_item.rb
- code that uses theHnClient
to store the appropriate records in PostgreSQL databaseapp/lib/hn_trends_calculator.rb
- code to calculate trends over time and top items for given search terms. The trends endpoint returns 4 metrics for each term/date:- Fraction of all front page items
- Number of all front page items
- Fraction of total front page score, i.e. the total score of items matching the search term divided by the total score of all front page items
- Front page score
The trends calculator supports searching titles, domains (with or without subdomains), and usernames. When searching by title, there are 3 different search styles:
- Web search uses PostgreSQL full text search, in particular the websearch_to_tsquery() function and GIN indexes. By default the tsv column uses the
simple
text search configuration - Case-insensitive exact title match uses the
~*
PostgreSQL regular expression operator, combined with a trigram index - Case-sensitive exact title match is the same as #2, but uses the
~
regex operator instead of~*
Requires PostgreSQL 11+, since websearch_to_tsquery()
was added in version 11