Matilda is a telegram bot written in Python 3 to scrape news articles, written in order to allow me to get a better understanding of Python. This bot is purely for educational purposes.
Matilda is currently still in the development stage. Currently, I only have time to work on Matilda on the weekends, so development for this bot might be a little slow.
- Straits Times
- ChannelNewsAsia
- TodayOnline (Beta)
Matilda is licensed under the Affero General Public License Version 3.
A sample version of this bot is currently running on Telegram, under @matilda_jk_bot.
- Thanks to LFlare for giving me the idea, and letting me take a look at his source code when I was stuck.
- Python-Telegram-Bot for making a wonderful wrapper, and having an excellent community who are willing to devote time to assist others.
- Sumy for building a wonderful python-based text summarizer.
- BeautifulSoup4 for an easy to use web scraper.
- PhantomJS for scraping JS based sites.
You can open an issue here to contact me regarding bugs.
- /cmd (full command list)
- /aboutme (about Matilda)
- /supported (supported sites)
- /mode (Switches Matilda between Full and Truncated)
- /new (Latest 5 articles from ST/today/cna)
- /rand (randoms 5 articles from ST/today/cna)
- /search (Searches for ST/today/cna articles)
- /today (scrapes Today Articles)
- /cna (scrapes CNA Articles)
- /st (scrapes straits times article)
- /cna_search (Searches for CNA Articles)
- /cna_new (Latest five CNA Articles)
- /st_search (Searches for ST Articles)
- /st_new (Latest five ST Articles)
- /st_rand (Randomly generates 5 articles from StraitsTimes)
- /cna_rand (Randomly generates 5 articles from CNA)
- /subscribe (Subscribes to Updates (Automatically subscribed by default))
- /unsub (Unsubscribes from updates)
- /mega (Sends a message to all chats that the bot has previously been used in. To use, add your user id to tokens.py)
If you have the article url, you can simply run /st or /cna
If not, you can use the search feature, to either search for specific keywords that appear in the article title, or to get the latest 5 articles.
The reason why only 5 articles are supported is because the sample version of this bot is not running on a very powerful server, and I do not wish to overload it.
From there, you can then use the inline buttons generated by the bot to read the article from the comfort of your telegram chat.
Install the following python libraries
- python-telegram-bot
- Beautiful Soup 4
- Requests
- Python String Utilities
- dateutil
- PyMySQL
- Sumy
- Selenium
Download PhantomJS and place it in the same directory. This is required for TodayOnline
Run the scripts found in the Matilda-tools folder. More information is avaliable there. This will enable you to grab new articles as they come out.
Update token.py with your bot's api token, mysql information, and the list of user ids for admin.
Start your bot with
python3 matilda.py
If you are running Matilda on linux, you may also want to use this command to ensure that Matilda keeps running after you exit the terminal.
sudo nohup python3 matilda.py > /home/matilda-live/error.log 2>&1 &