In order to download datasets from pyEGA, several requirements must be met:
- Access granted to the dataset.
- A valid EGA account. (username and password)
The EGA account credentials must be inserted in the credentials.json
file:
```json
{
"username": "your_username",
"password": "your_password"
}
```
-
A
pyega3
client installed.# with pip pip install pyega3 # with conda conda install -c bioconda pyega3
-
The dataset accession number. This can be found in the EGA dataset page.
Once the requirements are met, the datasets can be downloaded using the download.sh
script. Ideally, this would be run as a SLURM job, but due to the amount of data to be downloaded, this must be run in the /dipc/
partition.
In order to run the script, fill in the requested information (credentials' file path, dataset accession number) and run the following command:
nohup bash download.sh &
The script will download the dataset to the /dipc/
partition, in the current working directory.
To check the download progress, run the following command:
tail -f nohup.out