Concatenate the scores that a cell belong to each cell type on the cell type hierarchy to the original output file (Cell_ID, ImmClassifier_prediction).
Maintainer: Xuan Liu xuan.liu.1@uth.tmc.edu
ImmClassifier (Immune cell classifier), a knowledge-based and lineage-driven immune cell classification algorithm with fine annotation granularity yet high prediction accuracy. ImmClassifer seamlessly integrates the biology of immune cell differentiation, the strength of heterogeneous reference datasets and the state-of-art machine learning models. ImmClassifier cascades a machine learning module and a deep learning module.
We have provided a Docker container that includes ImmClassifier and all its dependencies. To get the container, you must have Docker installed and use this command:
docker pull xliu18/imm-classifier
Alternatively, you can also build locally and install from the GitHub source. To do this, you must have both git and Docker installed and then run the commands:
git clone https://github.com/xliu-uth/ImmClassifier.git
cd ImmClassifier
docker build . -t imm-classifier
All pre-requisites are in the Docker image. We recommend using this image instead of installing locally.
ImmClassifier runs a series of three individual commands using both R and Python. It requires two parameters:
- the input file
- the output prefix
ImmClassifier works on immune cells only. If the input gene expression profile contains non-immune cells, e.g. fibroblasts, cancer cells, stromal cells, please keep the immune cells from your input by using other cell classifiers that work on both immune and non-immune cells (e.g. SingleR, Garnett). For example, run SingleR for the whole gene expression profile, after obtaining the cell labels, please extract the gene expression profiles for the immune cells as the input for ImmClassifier.
This file is a matrix of gene expression values where the rows represent gene names with the HUGO identifiers and the columns represent individual cells.
For the gene expression values, we have tested log_2 CPM (counts per million), log_2 TPM (transcripts per million), and microarray RMA values with good results.
An example file is at: https://github.com/xliu-uth/ImmClassifier/blob/master/test/bulk.logrma.txt
For large input file, it's recommended to convert the input file to .rds format and increase the memory in Docker (eg. >8GB) to avoid memory crash.
To generate the .rds input, please import your input file into a data.frame with the Cell IDs as the rownames and the gene symbols as the column names. Use saveRDS command to export the data.frame into a .rds file.
This file is a matrix of cell type predictions for each cell.
The first column is the Cell ID and the second column is the predicted cell type. The other columns are the scores that a cell belongs to each cell type on the cell type hierarchy.
To run the command you must provide the path to your input file, the output prefix and mount the /tmp
directory to get the output files.
docker run --volume $PWD:/tmp -ti xliu18/imm-classifier --input `input_file` --output `output_prefix`
input_file
is the name of input file. It needs to be in the local
directory for docker to find it. If you are a docker expert, you may specify files in other directories, but you will need to bind the directory using the docker --volume command.
The output files will be written to your local directory with the prefix specified by the output_prefix
argument. ImmClassifier will generate three
files:
output_prefix
.dnn.input.txtoutput_prefix
.deeplearning.ontotree.stats.txtoutput_prefix
.output.txt
The output_prefix
.output.txt file contains the final predictions.
The other files contain intermediate results.
The overall approach is detailed in the following publication:
Liu X, Gosline SJC, Pflieger LT, Wallet P, Iyer A, Guinney J, Bild AH, Chang JT. Knowledge-based classification of fine-grained immune cell types in single-cell RNA-Seq data. Brief Bioinform. 2021 Sep 2;22(5):bbab039. doi: 10.1093/bib/bbab039. PMID: 33681983; PMCID: PMC8536868. https://pubmed.ncbi.nlm.nih.gov/33681983/
- Version Feburary 12, 2021, the cell hierarchy is updated
- Version March 10, 2020, the first official release
- Version September 9, 2020, the cell hierarchy is updated
- Update: Feburary 23, 2021 Use fread to speed up query file input.