- Meet the data science cookiecutter requirements, in brief:
- Install:
git-crypt
andconda
- Have a Nesta AWS account configured with
awscli
- Install:
- Run
make install
to configure the development environment:- Setup the conda environment
- Configure pre-commit
- Configure metaflow to use AWS
Run python ai_papers_with_code/pipeline/data/fetch_pwc_data.py
to fetch the papers with code datasets.
The data is saved in inputs/data
.
Run python ai_papers_with_code/pipeline/data/fetch_arxiv.py
to fetch the arXiv tables from S3.
The data is saved in inputs/data
NB this requires AWS credentials
TODO: Make this available to anyone
Run python ai_papers_with_code/pipeline/data/scrape_publications.py
to scrape arXiv publications from DeepMind and OpenAI's websites
Use the getters in ai_papers_with_code/getters/getters.py
to get papers with code tables.
Technical and working style guidelines
Project based on Nesta's data science project template (Read the docs here).