scRNA-Seq data binarisation and synthetic generation from Boolean dynamics.
pip install scboolseq
conda install -c conda-forge -c colomoto scboolseq
scBoolSeq
is included in the ColoMoTo Docker distribution.
Here a minimal example is presented, using the same dataset as the CLI usage guide. For further information, please check the documentation.
import pandas as pd
from scboolseq import scBoolSeq
# read in the normalized expression data
nestorowa = pd.read_csv("data_Nestorowa.tsv.gz", index_col=0, sep="\t")
nestorowa.iloc[1:5, 1:5]
# HSPC_031 HSPC_037 LT-HSC_001 HSPC_001
# Kdm3a 6.877725 0.000000 0.000000 0.000000
# Coro2b 0.000000 6.913384 8.178374 9.475577
# 8430408G22Rik 0.000000 0.000000 0.000000 0.000000
# Clec9a 0.000000 0.000000 0.000000 0.000000
#
# NOTE : here, genes are rows and observations are columns
scbool_nest = scBoolSeq()
##
## Binarization
##
# scBoolSeq expects genes to be columns, thus we transpose the DataFrame.
scbool_nest.fit(nestorowa.T) # compute binarization criteria
binarized = scbool_nestorowa.binarize(nestorowa.T)
binarized.iloc[1:5, 1:5]
# Kdm3a Coro2b 8430408G22Rik Phf6
# HSPC_031 1.0 NaN NaN 0.0
# HSPC_037 0.0 1.0 NaN 0.0
# LT-HSC_001 0.0 1.0 NaN 1.0
# HSPC_001 0.0 1.0 NaN 1.0
##
## Synthetic RNA-Seq generation from Boolean states
##
# We load in a boolean trace obtained from the simulation of a Boolean model
boolean_trace = pd.read_csv("boolean_dynamics.csv", index_col=0)
boolean_trace
# Kdm3a Coro2b 8430408G22Rik Phf6
# init 1.0 0.0 1.0 0.0
# transient_1 0.0 1.0 1.0 0.0
# transient_2 0.0 1.0 0.0 1.0
# stable_state 0.0 1.0 1.0 1.0
synthetic_scrna_pseudocounts = scbool_nestorowa.sample_counts(boolean_trace)