Uncovering underlying physical principles and driving forces of cell differentiation & reprogramming using single-cell transcriptomics
The scRNA-seq raw data can be downloaded from GEO or ENA with an accession number.
Then, Cellranger or kb-python can be used to process the raw data to get the count data, the Scanpy or Seurat package help to preprocess the count data, such as Filtering, normalization, dimensionality reduction, clustering, etc.
For the conventional scRNA-seq data, which contains information on both spliced (mature mRNA, exon reads) and unspliced (pre-mRNA, intron reads) transcripts. One can utilize Samtools, Velocyto or kb-python to quantify spliced counts and unspliced counts for each gene. Then, RNA velocity could be estimated by various toolkits following the protocol under the default parameters, for example, Velocyto, Scvelo, CellDancer, Dynamo, etc. It is necessary to adopt the appropriate toolkit according to different biological backgrounds.
However, the conventional RNA velocity relies on the mispriming of intron reads for current single-cell platforms, and thus the intron measures were biased and inaccurate, and it was scaled by the splicing rate and lacked real physical meanings (i.e., molecules/hour). The metabolic labeling-based method which measures both the historical or old and the new and nascent RNA of cells in a controllable way will be a better measurement for RNA velocity and transcriptomic dynamics. Dynast toolkit can be used for analyzing labeling datasets and quantifying new RNA counts and old RNA counts. Then Dynamo package can estimate the absolute RNA velocity following the protocol under the default parameters.
One can reconstruct the cell fate vector field from RNA velocity with dyn.vf.VectorField
by using Dynamo. Then, the differential geometry (divergence, curl, acceleration, curvature, jacobian, etc) could be analyzed based on the reconstructed vector field. One can also learn an analytical function of vector field from sparse single-cell samples on the entire space robustly by vector_field_function
function in Dynamo.
One could simulate stochastic dynamics by solving the Langevin equation based on the analytical function to get the steady-state probability distribution and quantify the non-equilibrium landscape and flux.
For example, one can run mouse_landscape_multi.py
in ./landscape-flux/mouse_retina
to generate the steady-state probability distribution of mouse retina development dynamics. The step can output grid data (Xgrid.csv
, Ygrid.csv
), probability distribution data (p_tra.csv
), and stochastic force distribution data (mean_Fx.csv
, mean_Fy.csv
). Then, plot_landscape_path.m
in ./landscape-flux
can be run to plot the landscape and least action path.
On the other hand, if one model the system with stochastic differential equations, one can run C++ following toggle.c
in ./landscape-flux/toggle
to generate the steady-state probability distribution data landscape.txt
, then run toggle_landscape_path.m
to plot the global dynamics landscape and paths.
One can use Dynamo to calculate the least action path time dyn.pd.least_action
between different cell-type transitions (loop_flux.m
to decompose the loop fluxes, which connect different cell states. Of course, one could adopt other appropriate toolkits to calculate the transition matrix according to different biological backgrounds, for example, Scvelo, Cellrank, etc.
One can reproduce the results by running .ipynb
files in ./notebook
, please note that the name of the notebook file may be inconsistent with the officially published article. Of course, users can also load user-defined data for analysis following the tutorials.
Zhu L, Yang S, Zhang K, Wang H, Fang X, Wang J. Uncovering underlying physical principles and driving forces of cell differentiation and reprogramming from single-cell transcriptomics. Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2401540121. doi: 10.1073/pnas.2401540121.