PCASA

PCASA (Prioritization of Combinatorial Cancer-Associated Surface Antigens) is a tool for predicting the best gene combination targets of surface antigens by classification of malignant and non-malignant cells using single-cell RNA-seq data.

Requirements

Required packages for RF (Random forest) in R.

R (>= 3.6.0): https://cloud.r-project.org/bin/linux/ubuntu/
ROCR (>= 1.0-11), caret (>= 6.0-88), e1071 (>= 1.7-8), ggplot2 (>= 3.3.5), gplots (>=3.1.1), randomForest (>= 4.6-14), rpart (>= 4.1-15)

Required modules for CNN (Convolutional neural network) in Python.

Python (>= 3.6.9): https://www.python.org/downloads/source/
pip3 (>= 21.2.4), numpy (>= 1.19.5), matplotlib (>= 3.3.4), pandas (>= 1.1.5), scikit-learn (>= 0.24.1), keras (>= 2.6.0), tensorflow (>= 2.6.0), tf-keras-vis (>= 0.8.0)

Running codes

Overview

PCASA is a three-step program.

Step 1. Prioritization of the single genes well-classifying tumor and normal cells.
Step 2. Calculation of the expressing cell fraction (ECF) for each gene combination.
Step 3. Prioritization of the gene combinations well-classifying tumor and normal cells.

Input files

The first command for each step requires input data.

For step 1, a tab-delimited file containing cell-by-gene sparse matrix composed of log-transformed counts with a column showing the binary class (Tumor, 1; Normal, 0).
For steps 2 & 3, a two-column tab-delimited file containing cell-code and cell-type.
For steps 2 & 3, a tab-delimited file containing gene-by-cell sparse matrix composed of log-transformed counts.

Step1. RF - Cell classifier for single genes

cd code

Rscript step-1a__random_forest.R ../data/input-1__scrna_class.txt
# 'input-1__scrna_class.txt': Cell-by-Gene-Matrix-with-Class

python step-1b__random_forest.py

Step2. Expression logic evaluator for ECF

python step-2__gate_coverage_calc.py \
../data/input-2a__scrna_annotation.txt \
../data/input-2b__scrna_gc-matrix.txt
# 'input-2a__scrna_annotation.txt' : Cell-Type-Annotation
# 'input-2b__scrna_gc-matrix.txt' : Gene-by-Cell-sparse-Matrix

Step3. CNN - Cell classifier for gene combinations

python step-3a__cnn_gradcam.py \
../data/input-2a__scrna_annotation.txt \
../data/input-2b__scrna_gc-matrix.txt
# 'input-2a__scrna_annotation.txt' : Cell-Type-Annotation
# 'input-2b__scrna_gc-matrix.txt' : Gene-by-Cell-sparse-Matrix

python step-3b__cnn_gradcam.py
python step-3c__cnn_gradcam.py

Pre-process

The directory named 'preprocess' contains the codes for pre-processing.

Initial filtering, normalization, and dimensional reduction by Seurat
Reference-based cell-type annotation by SingleR
Aneuploid cell prediction by CopyKat
Volume-dependent sub-sampling by Geosketch
Batch correction by BBKNN
Geosketch and BBKNN codes are based on the module named 'scjp' that Dr.Park made manually.

changwn/PCASA

PCASA

PCASA (Prioritization of Combinatorial Cancer-Associated Surface Antigens) is a tool for predicting the best gene combination targets of surface antigens by classification of malignant and non-malignant cells using single-cell RNA-seq data.

Requirements

Running codes

Overview

Input files

Step1. RF - Cell classifier for single genes

Step2. Expression logic evaluator for ECF

Step3. CNN - Cell classifier for gene combinations

Pre-process