Sick of the "computer algorithms" that google news uses, I wanted to make a more open version of a news reader for personal use, with the benifit of zero tracking or cookies.
The theory is that, given a pool of article titles, major events would have similar titles and thus are headlines.
To compare headlines, semantics are extracted via distilbert-base-uncased, where we can use k-means to find the center of clusters and rank headlines based on their distances.
Visit a version of it (running on a raspberry pi 3) Here!
An rss feed is supported at https://semanticnews.dedyn.io/feed/rss as well as all other endpoints.
You will need python3 (tested on 3.9.2) and the following libraries
pip3 install fastapi uvicorn rfeed feedparser numpy transformers torch onnxruntime
Then just download, unzip, and start the local server and visit http://127.0.0.1:8080 on your browser!
python3 main.py
Note, the startup time will be awfully slow due to downloading and converting the bert model to onnx to run on a pi, as well as the initial population and vectorisation of articles.
Got any other rss source you want to see added? Chuck in a pull request for sources.py and Ill see to it.
Remember to give this repo a ⭐ if you found it useful.