/scMultiBench

Multi-task benchmarking of single-cell multimodal omics integration methods

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

scMultiBench

Multi-task benchmarking of single-cell multimodal omics integration methods

Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorisation, evaluation, and benchmark. Navigating and selecting the most pertinent integration approach poses a significant challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection, and spatial registration, and at which combinations will help guide this decision. This study aims to develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorisation and comprehensive benchmarking of current methods.

Integration Tools

In this benchmark, we evaluated 34 integration methods across the four data integration categories on 53 datasets on a Ubuntu system with 1 RTX3090 GPU. In particular, we include 17 vertical integration methods, 11 diagonal integration tools, 7 mosaic integration tools, and 14 cross integration tools. The installation environment is set up according to the respective tutorials. Tools that are compared include:

Vertical Integration:

Diagonal Integration:

Mosaic Integration:

Cross Integration:

Note that the installation time for tools can vary depending on the method used. For more detailed information, refer to the original publication.

Evaluation Pipeline

All evaluation pipelines can be found within the metrics folder. Example datasets are stored in the 'example_data' folder. For spatial registration data, users are required to download it from link, and then put it in the 'example_data/spatial/' folder.

License

This project is covered under the Apache 2.0 License.