Ultrafast clustering of single-cell flow cytometry data using FlowGrid

Authors: Xiaoxin Ye and Joshua W. K. Ho
Contact: j.ho@victorchang.edu.au
Copyright © 2018, Victor Chang Cardiac Research Institute

Input data format

Our FlowGrid algorithm could be applied into many format data set but the sample code only accept csv format. In the csv file, the first row is feature name and each columns is seperated by ",". If you have true label file , you could use --l filename to input label file for testing the ARI of FlowGrid result.

Install

Before using the package, we need to install the dependent package sklearn and numpy.

pip install -r requirements.txt --user

pip install sklearn numpy scipy --user

Usage

A summary of the argument of sample code is included in the table below.

Argument	Usage	Required?
--f	the input file name	required
--n	number of bins	required
--eps	maximun distance between two bins	required
--t	threshold for high density bin	optional (default:40)
--o	the output file name	optional (default: out.csv)
--l	the true label file name	optional

Sample

After installing all the dependent packages, you could try to use the sample code to run FlowGrid on the sample data.

python sample_code.py --f sample_data.csv --n 4  --eps 1.1 --l sample_label.csv

The predicted label is saved at out.csv and the sample result is as follow.

The number of cells is: 23377
The number of dimensions is: 4
runing time: 0.027
ARI:0.9816

seanyexx/FlowGrid

Ultrafast clustering of single-cell flow cytometry data using FlowGrid

Input data format

Install

Usage

Sample