Multi-seq-Data-Analysis-Post-Analysis

Installation

Clone the repo

git clone https://github.com/Sage-Bionetworks-Challenges/Multi-seq-Data-Analysis-Post-Analysis
cd Multi-seq-Data-Analysis-Post-Analysis

Create a conda environment using python 3.9:

conda create --name synapse python=3.9 -y
conda activate synapse

Install Python dependencies
```
python -m pip install challengeutils==4.2.0
```
check if synapseclient and challengeutils are installed via:
```
synapse --version
challengeutils -v
```
Install R dependencies
```
R -e 'source("install.R")'
```
Note:
The task 2 analysis uses bedr package that has two requisitions - bedpos and tabix needed to be installed as well.
Set up Synapse credentials via CLI, or manually store the credentials to ~/.synapseConfig - see details here synapse login --rememberMe

Download all final submission results and each individual test case's scores to data/ folder:

Rscript submission/get_submissions.R

final_submissions_{task}.rds: Esseential information of final submission, e.g submission id, team, ranks
final_scores_{task}.rds: All test case scores from final submissions, consists of test case name, scores of primary and secondary metrics

Download output files (imputed gene expression / called peaks) of all final submissions to data/model_output/

# replace {task} with 'task1' or 'task2'
Rscript submission/get_predictions_{task}.R

Warning For Task 1, the output (imputation) of each submission has large size ~30G. Please be aware of the available disk space.

Report statistics about submissions

Rscript -e 'rmarkdown::render("stats/get_submission_stats.rmd")'