arXiv_Daily

A toolkit for arXiv papers daily reading. The script will crawl arXiv papers in custom areas everyday and display key information.

Structure

data/: Directory for crawled data. All data are named by datetime. The data format is like:

"0": {
  "abstract": "",
  "authors": "",
  "pdf_link": "",
  "submitted_data": "",
  "title": ""
}

src/: Directory for codes and scripts. Containing:

static/: Directory for static files.
templates/: Directory for .html interface.
configs.py: File of customized urls and keywords to be crawled.
daily_arxiv_spyder.py: Code for arXiv spiders.
keep_recent_data.py: Code for keeping the data of the last week only.
log: Crontab log file.
main.py: Code for fastapi interface.
run.sh: Script to run all.

Run

# 1. environment preparation
>>> git clone https://github.com/Aman-4-Real/arXiv_Daily.git
>>> cd src && pip install -r requirements.txt

# 2. define your own urls and keywords and modify 'src/configs.py'

# 3. test interface
>>> python main.py

# 4. set your paths at 'src/run.sh'

# 5. use crontab command to run periodically
>>> crontab -e
# add '0 8 * * * cd /YOUR_PATH/arxiv_daily/src/ && sh run.sh >> /YOUR_PATH/arxiv_daily/src/log' at the end to run at 8 a.m. evryday and save log

DONE & ENJOY!