This section is about motivation behind the project. See tech stack.
Some stories on HN are frustrating and time consuming for dubious value. I believe there are other people who would also like to see less of certain type of content, hence suckless hn.
-
Why doesn't this instead exist as an app into which I login as an HN user and it hides stories on my behalf?
As a user I wouldn't login into a 3rd party app. As a developer I don't want to manage user credentials.
-
Can I have custom filters configurable from a UI?
Out of scope. Create an issue or submit a PR if there's a filter you wish to use.
-
Will you change a filter I use without my knowledge?
I am reluctant to change the logic of a filter once it's published. However if it absolutely needs to happen, you'll be informed by a short update notice at the bottom of the page.
-
Why not ML?
I prefer a set of transparent and readable rules to decide what I don't see. Plus that's easier.
A filter is given a story data and flags the story if it passes the filter. Feel free to create an issue for any missing but useful filter.
Each filter has a two landing pages. One with only stories which were flagged,
one with anything but. This is decided by two modifiers: +
and -
. For
example to only see stories from large newspapers visit
https://sucklesshn.porkbrain.com/+bignews
.
To get HN without large newspapers visit
https://sucklesshn.porkbrain.com/-bignews
.
There are also groups of filters. For example
https://sucklesshn.porkbrain.com/-amfg-bignews
filters out large newspapers and all mentions of big tech. This also happens
to be the default view on the homepage. -
modifier in a group is
conjunctive, i.e. only stories which didn't pass any of the filters are shown.
+
modifier is disjunctive, i.e. stories which passed any of the filters are
shown. For example
sucklesshn.porkbrain.com/+askhn+showhn
shows "Show HN" or "Ask HN" stories.
List of implemented filters:
-
+bignews
/-bignews
flags urls from large news sites Bloomberg, VICE, The Guardian, WSJ, CNBC, BBC, Forbes, Spectator, LA Times, The Hill and NY Times. More large news may be added later. Any general news website which has ~60 submissions (2 pages) in the past year falls into this category. HN search query:https://hn.algolia.com/?dateRange=pastYear&page=2&prefix=true&sort=byPopularity&type=story&query=${DOMAIN}
. -
+amfg
/-amfg
flags titles which mention "Google", "Facebook", "Apple" or "Microsoft". No more endless Google-bashing comment binging at 3 AM. Most of the time the submissions are scandalous and comment sections low entropy but addictive. -
special
+all
front page which includes all HN top stories
List of filter groups:
Filters in a group are alphabetically sorted ASC.
The binary is executed periodically (~ 30 min). Each generated page is an S3 object, therefore we don't need to provision a web server.
sqlite
database stores ids of top HN posts that are already
downloaded + some other data (timestamp of insertion, submission title, url,
which filters it passed).
The endpoint to query top stories on HN is https://hacker-news.firebaseio.com/v0/topstories.json. We download stories which we haven't checked before. The data about a story is available via item endpoint.
We check each new story against Suckless filters before inserting it into the
database table stories
. The flags for each filter are persisted in
story_filters
table.
Final step is generating a new html for the
sucklesshn.porkbrain.com front pages and uploading it into an
S3 bucket. The S3 bucket is behind Cloudfront distribution to
which the sucklesshn.porkbrain.com
DNS zone records point. We set up
different combinations of filters and upload those combinations as different S3
objects. The objects are all of Content-type: text/html
, however they don't
have .html
extension.
We handle rate limiting by simply skipping submission. Since we poll missing stories periodically, they will be fetched eventually.
We don't need to check all top stories. We can slice the top stories endpoint and only download first ~ 50 entries.
Wayback machine has some kind of rate limiting which fails
concurrent requests. We run wayback machine GET
requests sequentially.
We leverage wayback machine APIs to provide users link to the latest archived snapshot at the time of the submission.
Please donate to keep Wayback machine awesome.
I run the binary on my k8s homelab cluster as a cron
job. Originally, this ran as a cron job on my raspberry pi
4, which is now a node in the cluster. I still build this project for
ARM. See the k8s
directory for more docs about how this
project runs in the cluster.
I use a build script to build and test this project. First,
you'll need to install cross
:
cargo install --git https://github.com/anupdhml/cross.git --branch master
We use custom image for compilation to support [OpenSSL][cross-opensll].
Next, either use the build script or directly compile for
armv7-unknown-linux-gnueabihf
:
cross build --target armv7-unknown-linux-gnueabihf --release
Locally I build the docker image with the binary and push it to the Docker hub. That's where my k8s cluster pulls it from.
See the .env.example
file for environment variable the binary
expects.