Multi-task benchmarking of single-cell multimodal omics integration methods
Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorisation, evaluation, and benchmark. Navigating and selecting the most pertinent integration approach poses a significant challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection, and spatial registration, and at which combinations will help guide this decision. This study aims to develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorisation and comprehensive benchmarking of current methods.
In this benchmark, we evaluated 34 integration methods across the four data integration categories on 53 datasets on a Ubuntu system with 1 RTX3090 GPU. In particular, we include 17 vertical integration methods, 11 diagonal integration tools, 7 mosaic integration tools, and 14 cross integration tools. The installation environment is set up according to the respective tutorials. Tools that are compared include:
Vertical Integration:
- totalVI v1.1.2
- sciPENN v1.0.0
- Concerto Github Version: ab1fc7f
- scMSI Github Version: dffcbb2
- Matilda Github Version: 7d71480
- MOFA+ v1.6.0
- Multigrate v0.0.2
- UINMF v2.0.1
- scMoMaT v0.2.2
- Seurat_WNN v5.0.2
- scMM Github Version: c5c8579
- scMDC Github Version: 43b0c3a
- moETM Github Version: ad89fe2
- VIMCCA Github Version: 0.5.5
- iPOLNG v0.0.2
- MIRA v2.1.0
- UnitedNet Github Version: 3689da8
- scMVP Github Version: fc61e4d
Diagonal Integration:
- scBridge Github Version: ff17561
- Portal v1.0.2
- SCALEX v1.0.2
- VIPCCA v0.2.7
- Seurat v3 v5.0.2
- MultiMAP v0.0.1
- Seurat v5 v5.0.2
- sciCAN Github Version: ad71bba
- Conos v1.4.6
- iNMF v2.0.1
- online iNMF v2.0.1
- scJoint Github Version: cbbfa5d
- GLUE Github Version: 192bb6e
Mosaic Integration:
- MultiVI v1.1.2
- scMoMaT v0.2.2
- StabMap v0.1.8
- Cobolt v1.0.1
- UINMF v2.0.1
- Multigrate v0.0.2
- SMILE Github Version: a2e2ca6
- totalVI v1.1.2
- sciPENN v1.0.0
- scMM Github Version: c5c8579
- moETM Github Version: ad89fe2
- UnitedNet Github Version: 3689da8
Cross Integration:
- totalVI v1.1.2
- scMoMaT v0.2.2
- UnitedNet Github Version: 3689da8
- sciPENN v1.0.0
- Concerto Github Version: ab1fc7f
- scMDC Github Version: 43b0c3a
- StabMap v0.1.8
- UINMF v2.0.1
- scMM Github Version: c5c8579
- MOFA+ v1.6.0
- Multigrate v0.0.2
- PASTE v1.4.0
- PASTE2 Gihub Version: a419f02
- SPIRAL v1.0
- GPSA v0.8
Note that the installation time for tools can vary depending on the method used. For more detailed information, refer to the original publication.
All evaluation pipelines can be found within the metrics folder. Example datasets are stored in the 'example_data' folder. For spatial registration data, users are required to download it from link, and then put it in the 'example_data/spatial/' folder.
This project is covered under the Apache 2.0 License.