Clone the repository:
git clone https://github.com/MDverse/mdws.git
Move to the new directory:
cd mdws
Install miniconda.
Install mamba:
conda install mamba -n base -c conda-forge
Create the mdws
conda environment:
mamba env create -f binder/environment.yml
Load the mdws
conda environment:
conda activate mdws
Note: you can also update the conda environment with:
mamba env update -f binder/environment.yml
To deactivate an active environment, use
conda deactivate
Create a token here: https://zenodo.org/account/settings/applications/tokens/new/
and store it in the file .env
:
ZENODO_TOKEN=YOUR-ZENODO-TOKEN
This file is ignored by git.
Scrap Zenodo for MD-related datasets and files:
python scripts/scrap_zenodo.py -q params/query.yml -o data
Scrap Zenodo with a small query, for development or demo purpose:
python scripts/scrap_zenodo.py -q params/query_dev.yml -o test
The scraping takes some time. A mechanism has been set up to avoid overloading the Zenodo API. Be patient.
Eventually, the scraper will produce three files: zenodo_datasets.tsv
, zenodo_datasets_text.tsv
and zenodo_files.tsv
✨
Scrap FigShare for MD-related datasets and files:
python scripts/scrap_figshare.py -q params/query.yml -o data
Scrap FigShare with a small query, for development or demo purpose:
python scripts/scrap_figshare.py -q params/query_dev.yml -o test
The scraping takes some time (complete query: 20 min-120 min). Be patient.
Eventually, the scraper will produce three files: figshare_datasets.tsv
, figshare_datasets_text.tsv
and figshare_files.tsv
✨
Run all Jupyter notebooks in batch mode:
jupyter nbconvert --to html --execute --allow-errors --output-dir results notebooks/analyze_zenodo.ipynb
jupyter nbconvert --to html --execute --allow-errors --output-dir results notebooks/zenodo_stats.ipynb
jupyter nbconvert --to html --execute --allow-errors --output-dir results notebooks/search_MD_in_pubmed.ipynb
cp notebooks/*.{svg,png} results/
To download Gromacs mdp and gro files from Zenodo, one can use the command line:
python scripts/download_files.py -i data/zenodo_files.tsv -o data/downloads/ -t mdp -t gro
This step will take a couple of hours to complete. Depending on the stability of your internet connection and the availability of the data repository servers, the download might fail with an error messages similar to
requests.exceptions.HTTPError: 429 Client Error: TOO MANY REQUESTS
or
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zenodo.org', port=443)
Re-rerun the previous command to resume the download. Files already retrieved will not be downloaded again.
Expect between 10 and 15 GB of data.
python scripts/parse_mdp_files.py -i data/downloads -o data
python scripts/parse_gro_files.py -i data/downloads -r params/residue_names.yml -o data