This takes an existing source RSS feed for Hacker News and runs readability on each item, creating a new RSS feed with the contents of the item directly embedded in the RSS feed.
The (only?) use of this RSS feed is with Feedly, since I use it :-)
I've seen that some items show up and then disappear from the source feed. Also, I put the number of Hacker News points in the title. Sometimes, the source feed mentions a post later with higher points. That means another title, and Feedly treats that as a new post entry. That leeds to duplicates in the feed as viewed in Feedly.
To work around the first issue: We first read the source feed. For each item
we check if it is in the database already, and if it isn't call readability for
this article, and put the final result in the database. Afterwards, we SELECT
all articles from the last 24 hours and create a feed with them. This has the
advantage that if a feed disappears in the source, it won't in this output.
Duplicates are avoided.
To avoid duplicates when "points" are updated, I've tried using a good ID for each article.
The thing is that this uses quite a lot of RAM. On lip.morch.com with 4GB of RAM, this would sometimes OOM. So we run it on base.
You'll need these two files, that aren't saved in the git repo (or create new ones in github):
- 'data/hn100-sshkey'
- 'data/hn100-sshkey.pub'
And
mkdir -p data
git git@github.com:pmorch/HackerNews100.git data/github-output
Build docker image:
sudo docker build -t hacker-news-rss .
...or to run it without docker:
# nix-shell uses shell.nix that installs nodejs and sqlite
nix-shell
npm install
sqlite3 data/articles.db 'SELECT COUNT(1) from articles;'
sqlite3 data/articles.db 'DELETE FROM articles; VACUUM;'
or simply
rm data/articles.db
sudo docker run --rm --name hacker-news-rss -v $PWD/data:/hacker/data hacker-news-rss
...or to run it without docker:
./runRss.sh
sudo cp hackerNewsCronD /etc/cron.d
sudo chown root: /etc/cron.d/hackerNewsCronD