NOTE: THIS PROJECT IS STILL IN TESTING PHASE
The ArXiv Paper Parser is a Python-based tool designed to automate the process of fetching, filtering, and notifying users about new academic papers published on the arXiv platform, based on predefined search criteria. It focuses on providing updates in specific research fields and keywords, managing a list of known papers to avoid duplicates, and pushing notifications to a user's device.
- Customizable Searches: Users can specify fields and keywords to tailor the search to their interests.
- Duplicate Avoidance: The tool tracks previously fetched papers to prevent duplicate notifications.
- Notification System: Utilizes the Pushover API to send updates directly to the user's device.
- Search Functionality: Utilizes the
arxiv
Python package to query the arXiv API. - Data Handling: Processes and stores data in text files, handling both new and existing entries.
- Notification Mechanism: Sends alerts through Pushover for new relevant papers.
-
Dependencies: Install required packages from
requirements.txt
andrequirements-dev.txt
. -
pip install -r requirements.txt
-
Environment Variables: Set up
.env
with Pushover API and User keys.APP_TOKEN=<your-app-token> USER_TOKEN=<your-user-token>
The .env file need to be setup by the user following the above example.
- Configuration: Adjust search parameters in
src/main.py
as needed.field="cat:cs.cv OR cat:eess.iv", title_keyword="ti:low field MRI OR all:low field MRI OR ti:low field magnetic resonance imaging"
Information on categories can be found here and general taxonomy here.
Run the main script to start the process:
python src/main.py
Ideally setup with crontab
for daily execution. Info for MacOS here
The script performs the following steps:
- Reads the current status of papers from
Relevance.txt
andSubmittedDate.txt
. - Searches for new papers based on the specified criteria.
- Filters out known entries to avoid duplicates.
- Appends new titles to the respective files.
- Sends notifications about new papers (title and Arxiv link).
- Search Criteria: Modify the
field
andtitle_keyword
parameters insrc/main.py
to change the search focus. - Notification Details: Adjust the message format and content in
src/utils.py
within thepush_to_device
function.
src/
: Contains the main script and utility functions.notebooks/
: Jupyter notebooks for testing and development..env
: Environment file for storing API keys.Relevance.txt
andSubmittedDate.txt
: Text files for tracking known papers.
Contributions are welcome! Please fork the repository and submit pull requests with your proposed changes.
This project is open-source and available under the GNU open source licence