The aim of this application is to allow the reproducibility of OpenTarget Platform data release pipeline. The input files are copied in a specific google storage bucket.
Currently, the application executes 6 steps and finally it generates a Yaml config file that can be used to run the the OT pipeline (https://github.com/opentargets/data_pipeline)
List of available steps:
- annotations
- annotations from buckets
- chemical probes
- known target safety
- ensembl
- ChEMBL
- evidences
The step 'evidences' uploads the last evidences from different providers and it generates a subset of these evidences using the file minimal_ensembl.txt
Below more details about how to execute the script.
- Conda
- git
Download Conda for Mac here:
https://www.anaconda.com/download/#macos.
[download Version 2.7 Command-Line Installer]
Conda: installation commands
bash path_where_downloaded_the_file/Anaconda2-5.3.0-MacOSX-x86_64.sh
source ~/.bashrc
conda update
conda update --prefix ~/anaconda2 anaconda
git clone https://github.com/opentargets/platform-input-support
cd platform-input-support
conda env create -f environment.yaml
conda activate platform-input-support-py2.7
conda skeleton pypi opentargets-urlzsource
conda build opentargets-urlzsource
conda build purge
conda install -y --use-local opentargets-urlzsource
python platform-input-support.py -h
conda activate platform-input-support-py2.7
cd your_path_application
python platform-input-support -h
usage: platform-input-support.py [-h] [-c CONFIG]
[-gkey GOOGLE_CREDENTIAL_KEY]
[-gb GOOGLE_BUCKET] [-o OUTPUT_DIR] [-t]
[-s SUFFIX] [-steps STEPS [STEPS ...]]
[-exclude EXCLUDE [EXCLUDE ...]] [--skip]
[-l] [--log-level LOG_LEVEL]
[--log-config LOG_CONFIG]
...
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to config file (YAML) [env var: CONFIG] (default:
None)
-gkey GOOGLE_CREDENTIAL_KEY, --google_credential_key GOOGLE_CREDENTIAL_KEY
The path were the JSON credential file is stored. [env
var: GOOGLE_APPLICATION_CREDENTIALS] (default: None)
-gb GOOGLE_BUCKET, --google_bucket GOOGLE_BUCKET
Copy the files from the output directory to a specific
google bucket (default: None)
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
By default, the files are generated in the root
directory [env var: OT_OUTPUT_DIR] (default: None)
-t, --thread Run the script with thread [env var: OT_THREAD]
(default: False)
-s SUFFIX, --suffix SUFFIX
The default suffix is yyyy-mm-dd [env var:
OT_SUFFIX_INPUT] (default: None)
-steps STEPS [STEPS ...]
Run a specific list of sections of the config file. Eg
annotations annotations_from_buckets (default: None)
-exclude EXCLUDE [EXCLUDE ...]
Exclude a specific list of sections of the config
file. Eg annotations annotations_from_buckets
(default: None)
--skip Skip the errors and just report them (default: False)
-l, --list_steps List of steps callable (default: False)
--log-level LOG_LEVEL
set the log level [env var: LOG_LEVEL] (default: INFO)
To copy the files in a specific google storage bucket valid credentials must be used. The required parameter -gkey (--google_credential_key) allows the specification of Google storage JSON credential. Eg.
python platform-input-support.py -gkey /path/open-targets-gac.json -gb bucket/object_path
or
python platform-input-support.py
--google_credential_key /path/open-targets-gac.json
--google_bucket ot-snapshots/es5-sufentanil/tmp
python platform-input-support.py
--skip
--google_credential_key /path/open-targets-gac.json
--google_bucket ot-snapshots/es5-sufentanil/tmp
or
python platform-input-support.py
--google_credential_key /path/open-targets-gac.json
--google_bucket ot-snapshots/es5-sufentanil/tmp
-steps annotations evidences
-exclude ChEMBL
or
python platform-input-support.py
-gkey /path/open-targets-gac.json
-gb bucket/object_path -steps ChEMBL
--log-level DEBUG > log.txt
The zip files generated might be corrupted. The follow command checks if the files are correct. sh check_corrupted_files.sh
Create a linux VM server and run the following commands
sudo apt update
sudo apt install git
sudo apt-get install bzip2
wget https://repo.anaconda.com/archive/Anaconda2-2018.12-Linux-x86_64.sh
bash Anaconda2-2018.12-Linux-x86_64.sh
source ~/.bashrc
conda update --prefix ~/anaconda2 anaconda
mkdir gitRepo
cd gitRepo
git clone https://github.com/opentargets/platform-input-support.git
cd platform-input-support
conda env create -f environment.yaml
conda activate platform-input-support-py2.7
python platform-input-support.py -l
Use nohup to avoid that the process hang up.
Eg.
nohup python platform-input-support.py
-gkey /path/open-targets-gac.json
-gb bucket/object_path -steps ChEMBL
--log-level DEBUG > log.txt &