Overview

This repository contains the code and data that is used to automatically identify the taxonomy of bacterial single-cells based on flow cytometry. If you find the research and/or data useful, please consider citing:

Rubbens, P., Props, R., Boon, N., Waegeman, W. (2017). Flow Cytometric Single-Cell Identification of Populations in Synthetic Bacterial Communities. PLOS ONE 12(1):e0169754.

Dependencies

InSilicoFlow depends on the following packages:

Python 3 (version 3.4.5)
numpy (version 1.11.1)
scikit-learn (version 0.18)
pandas (version 0.18.1)

insilico.py

Script used to perform the first part of the analysis explained in the paper. It mainly performs the following steps:

Choose two or m bacterial populations of interest.
Aggregate data coming from replicates.
Aggregate data coming from the populations of interest (creating the so-called in silico community).
Use 70% of the community to train a classifier (e.g., LDA or Random Forests).
Use the other 30% to evaluate the performance of a classifier (expressed in for example the accuracy, AUC, ...).

invitro.py

Script used to retrieve the composition of a synthetic bacterial community. It mainly performs the following steps:

Create in silico community, representing the synthetic community of interest.
Train classifier on full community.
Evaluate performance by retrieving community composition for various in silico/in vitro communities.

Data availability

Our data can be found in .csv-format on this repo. Additionally, our data is made available in .fcs-format on the flowRepository, using the identifiers FR-FCM-ZZSH (axenic cultures) and FR-FCM-ZZSG. It has been preprocessed following the robust digital gating strategy by Prest et al. (2013).

fcstocsv.py