My Papa asked me how do I get local newspaper online. And this script was created. B:) I thought of creating a cron job in my android but apparently it requires the android system to be rooted. Used Qpython Application on Android to run this script.
However I have Added Cron job on my ubuntu
that executes the script once a day. I have not fixed a particular time rather I have made it conditional.
- This script basically dowloads the individual pdf file of the news paper present on the website
http://epaper.jagran.com/homepage.aspx
andhttp://epaper.livehindustan.com/
- It then merges the individual pdfs.
- After the pdf merging it sends an email with an attachment ( e-paper ) to the email id.
- As my main motto was to run it on android I have separated what lib are to be imported and what cmd are needed to install them in QPython ( Android App ).
- whenever my system starts the script first checks the txt file.
- If the logged date is of today than the script doesn't execute
- Else the script is executed and after completon it logs todays date in the txt file
- First it checks if tor browser is started for the day
checkStartTorStatus.txt
. - Same check is done with
checkCronStatus.txt
andcheckCronStatusH.txt
.
- I was blown to see the bytes of the pdf file downloaded from
http://epaper.livehindustan.com/
website. The file size is over 10 Mb. In some cases it exceeds 20Mb. Yes one pdf file. So just do the math for 20 pages of news paper. - I initially used System cmd to compress file but it was not very efficient
CompressFile.py
. - So I choose to use online website
http://pdfcompressor.com/
. Its amazingly awesome. This site compressed file by 90%. - checkout the file
PdfCompressor.py
.
- These tools are no less than Bramhastra Haahha!
- TOR is just necessary while using PYTHON
- Use Firefox webdriver and changed the network preferences. See file
TorFirefox.py
.
- This file tracks any changes in the directory.
- Used this to make sure that unzipping of file is done only after file is completely downloaded.
- check the file
watchdogg.py
- Used
tqdm
to view the dowload progress. checkdownload.py
.
- install Qpython
- open terminal
- execute
pip.main(['install', 'bs4'])
- execute
pip.main(['install', 'PyPDF2'])
- I have attached a sample of execution in Qpython. However I have changed the code for setting up conditional cron in Ubuntu.
- I have added much more libraries untill now( 5th May 2018). The very first commit was focused on Qpython. Now its totally dekstop-mode script( I mean it has librabries than can be installed in Linux systems conveniently)
- change the config file
- add
to
idfrom
id - run
python Dainik_e_paper.py
( this runs both the fileDainik_e_paper.py
andhindustan.py
)
- pyvirtualdisplay
- BeautifulSoup
- urllib2
- selenium
- tqdm
- math
- PyPDF2
- requests
- smtplib, email
- wget
- TorFirefox
- multiprocessing
- watchdog
- you might have to turn on "Allow less secure apps for your emailid"
- SMTPAuthenticationError : Use below link to check enabling access for less secure app on gmail
- https://stackoverflow.com/questions/16512592/login-credentials-not-working-with-gmail-smtp
- I have added useful links that assisted me in some of the files.