This project automatically generate Questions and Answers on a given arXiv ids. For now, the CLI tool only supports to grasp arXiv ids from Hugging Face 🤗 Daily Papers. Also, it is possible to directly generate on a set of arXiv ids.
You can see the generated QA dataset from chansung/auto-paper-qa2 repository. Also, you can see how these dataset could be used with PaperQA space application.
If you want to do prompt engineering, modify the prompts.toml file. There are two prompts to play with.
To generate QAs of arXiv papers on a specific date, run:
export GEMINI_API_KEY=<YOUR-GEMINI-API>
export HF_ACCESS_TOKEN=<YOUR-HF-ACCESS-TOKEN>
python app.py --target-date $current_date \
--gemini-api $GEMINI_API_KEY \
--hf-token $HF_ACCESS_TOKEN \
--hf-repo-id $hf_repo_id \
--hf-daily-papers
If you want to generate QAs of arXiv papers on the range of date, run:
export GEMINI_API_KEY=<YOUR-GEMINI-API>
export HF_ACCESS_TOKEN=<YOUR-HF-ACCESS-TOKEN>
export HF_DATASET_REPO_ID=<YOUR-HF-DATASET-REPO-ID>
./date_iterator.sh "2024-03-01" "2024-03-03" $HF_DATASET_REPO_ID
To generate QAs of arXiv papers on a list of arXiv IDs, run:
export GEMINI_API_KEY=<YOUR-GEMINI-API>
export HF_ACCESS_TOKEN=<YOUR-HF-ACCESS-TOKEN>
python app.py \
--gemini-api $GEMINI_API_KEY \
--hf-token $HF_ACCESS_TOKEN \
--hf-repo-id $hf_repo_id \
--arxiv-ids <arxiv-id> <arxiv-id> ...
This is a project built during the Gemini sprint held by Google's ML Developer Programs team. I am thankful to be granted good amount of GCP credits to finish up this project.