neurips2021_multimodal_topmethods

This repository is a collection of top methods submitted to the OpenProblems / NeurIPS 2021 competition for multimodal single-cell data integration (link).

Through the pipelines and code contained in this repository, you should be able to replicate the obtained scores for each of the top submissions.

List of methods

Task Team Method Authors
joint_embedding Amateur JAE Qiao Liu , Wanwen Zeng , Chencheng Xu
joint_embedding Living-Systems-Lab LSL_AE Sumeer Khan, Robert Lehman, Xabier Martinez De Morentin, Aidyn Ubingazhibov, Minxing Pang
joint_embedding Guanlab-dengkw method name John Doe
match_modality GLUE CLUE Zhi-Jie Cao , Xin-Ming Tu , Chen-Rui Xia
predict_modality Cajal Cajal Anna Laddach , Roman Laddach , Michael Shapiro
predict_modality Novel Novel Gleb Ryazantsev, Nikolay Russkikh, Igor I
predict_modality Guanlab-dengkw method name John Doe
predict_modality LS_lab method name John Doe
predict_modality DSE method name Hongzhi Wen, Jiayuan Ding, Wei Jin, Xiaoyan Li, Zhaoheng Li, Haoyu Han, Yuying Xie, Jiliang Tang

Dependencies

The dependencies of this repository are the same as those of the competition itself (link).

To run any of the methods, you need to download the required binaries and datasets and build the relevant docker containers first.

# download viash and nextflow
bin/init

# sync datasets to local
src/sync_datasets.sh

# build components and docker containers
bin/viash_build --max_threads 4

You can then rerun components by running the bash script located in the respective folders.

For example:

$ src/predict_modality/methods/cajal/test.sh

...

N E X T F L O W  ~  version 21.04.1
Pulling openproblems-bio/neurips2021_multimodal_viash ...
Launching `openproblems-bio/neurips2021_multimodal_viash` [trusting_woese] - revision: a28e0c22c5 [1.4.0]
executor >  local (5)
[4e/f4b91d] process > get_id_predictions (4)                                                            [100%] 4 of 4, cached: 3 ✔
[94/ffca9e] process > get_id_solutions (2)                                                              [100%] 4 of 4, cached: 4 ✔
[b3/92d144] process > bind_tsv_rows:bind_tsv_rows_process (meta_metric)                                 [100%] 1 of 1, cached: 1 ✔
[44/667c85] process > mse:mse_process (openproblems_bmmc_cite_phase2_PM_gex2adt)                        [100%] 4 of 4, cached: 3 ✔
[4f/8061cc] process > correlation:correlation_process (openproblems_bmmc_cite_phase2_PM_gex2adt)        [100%] 4 of 4, cached: 3 ✔
[ea/988a0f] process > check_format:check_format_process (openproblems_bmmc_cite_phase2_PM_gex2adt)      [100%] 4 of 4, cached: 3 ✔
[dc/393dca] process > final_scores:final_scores_process (output)                                        [100%] 1 of 1 ✔

[{"method_id":"cajal","ADT2GEX":0.3273,"ATAC2GEX":0.2172,"GEX2ADT":0.4613,"GEX2ATAC":0.178,"Overall":0.2959}]

After the bash script has finished running, output will have been stored in:

  • output/pretrain/<task-id>/<method-id>/<dataset-id>.output_pretrain: Pre-trained models (if required).
  • output/predictions/<task-id>/<method-id>/<dataset-id>: Predictions made by the method.
  • output/evaluation<method-id>/<dataset-id>: Evaluation metrics.

Where is one of three competition tasks (predict_modality, match_modality or joint_embedding).