neurips2021_multimodal_topmethods
This repository is a collection of top methods submitted to the OpenProblems / NeurIPS 2021 competition for multimodal single-cell data integration (link).
Through the pipelines and code contained in this repository, you should be able to replicate the obtained scores for each of the top submissions.
List of methods
Task | Team | Method | Authors |
---|---|---|---|
joint_embedding | Amateur | JAE | Qiao Liu , Wanwen Zeng , Chencheng Xu |
joint_embedding | Living-Systems-Lab | LSL_AE | Sumeer Khan, Robert Lehman, Xabier Martinez De Morentin, Aidyn Ubingazhibov, Minxing Pang |
joint_embedding | Guanlab-dengkw | method name | John Doe |
match_modality | GLUE | CLUE | Zhi-Jie Cao , Xin-Ming Tu , Chen-Rui Xia |
predict_modality | Cajal | Cajal | Anna Laddach , Roman Laddach , Michael Shapiro |
predict_modality | Novel | Novel | Gleb Ryazantsev, Nikolay Russkikh, Igor I |
predict_modality | Guanlab-dengkw | method name | John Doe |
predict_modality | LS_lab | method name | John Doe |
predict_modality | DSE | method name | Hongzhi Wen, Jiayuan Ding, Wei Jin, Xiaoyan Li, Zhaoheng Li, Haoyu Han, Yuying Xie, Jiliang Tang |
Dependencies
The dependencies of this repository are the same as those of the competition itself (link).
To run any of the methods, you need to download the required binaries and datasets and build the relevant docker containers first.
# download viash and nextflow
bin/init
# sync datasets to local
src/sync_datasets.sh
# build components and docker containers
bin/viash_build --max_threads 4
You can then rerun components by running the bash script located in the respective folders.
For example:
$ src/predict_modality/methods/cajal/test.sh
...
N E X T F L O W ~ version 21.04.1
Pulling openproblems-bio/neurips2021_multimodal_viash ...
Launching `openproblems-bio/neurips2021_multimodal_viash` [trusting_woese] - revision: a28e0c22c5 [1.4.0]
executor > local (5)
[4e/f4b91d] process > get_id_predictions (4) [100%] 4 of 4, cached: 3 ✔
[94/ffca9e] process > get_id_solutions (2) [100%] 4 of 4, cached: 4 ✔
[b3/92d144] process > bind_tsv_rows:bind_tsv_rows_process (meta_metric) [100%] 1 of 1, cached: 1 ✔
[44/667c85] process > mse:mse_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[4f/8061cc] process > correlation:correlation_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[ea/988a0f] process > check_format:check_format_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[dc/393dca] process > final_scores:final_scores_process (output) [100%] 1 of 1 ✔
[{"method_id":"cajal","ADT2GEX":0.3273,"ATAC2GEX":0.2172,"GEX2ADT":0.4613,"GEX2ATAC":0.178,"Overall":0.2959}]
After the bash script has finished running, output will have been stored in:
output/pretrain/<task-id>/<method-id>/<dataset-id>.output_pretrain
: Pre-trained models (if required).output/predictions/<task-id>/<method-id>/<dataset-id>
: Predictions made by the method.output/evaluation<method-id>/<dataset-id>
: Evaluation metrics.
Where is one of three competition tasks (predict_modality
,
match_modality
or joint_embedding
).