tapir
is a python 3 package for the analysis of gene expression data.
It includes a number of functions for statistical analysis, differential expression
and gene sets enrichment analysis.
WARNING: This library is still in active development and we hope to add more options in the future. Please feel free to leave feedback, suggestions or to contribute to this repository.
This library includes
- TMM normalization with EdgeR
- differential expression analysis with EdgeR
- gene sets enrichment analysis with gseapy
- survival analysis with lifelines
- immune deconvolution with MCPcounter
- dimensionality reduction with UMAP
- plotting functions for distribution comparisons, heatmaps and gene sets networks.
Detailed documentation, API references and tutorials can be found at this link.
Besides basic scientific and plotting libraries, the current version requires
- gseapy
- lifelines
- rpy2
- seaborn
- scikit-learn
- statsmodels
- umap-learn
** R, EdgeR and MCPcounter need to be installed independently. **
tapir releases can be easily installed through the python standard package manager
pip install tapir-rna
To install the latest (unreleased) version you can download it from this repository by running
git clone https://github.com/fcomitani/tapir
cd tapir
python setup.py install
Given an input
dataset in pandas-like format (samples X genes), the build_dgelist
and diff_exp
functions will allow you to normalize
the samples as TMM and fit a glmQL model for differential expression
significance.
from tapir.edger import build_dgelist, diff_exp
dgelist, tmmlog = build_dgelist(input_table)
de = diff_exp(dgelist, groups, filter=True)
- federico.comitani at sickkids.ca
- josh.nash at sickkids.ca
This library is still a work in progress and we are striving to improve it, by adding more flexibility and increase the memory and time efficiency of the code. If you would like to be part of this effort, please fork the master branch and work from there.
Contributions are always welcome.