/casskit

casskit: cancer association studies toolkit

Primary LanguagePythonMIT LicenseMIT

Project generated with PyScaffold

casskit

Toolkit for Cancer Association studies.

Summary of related packages

Fetching and parsing TCGA
Name Language Active? Description
TCGAbiolinks R Yes An R/Bioconductor package for integrative analysis with TCGA data
TCGAutils R Yes MultiAssayExperiment for TCGA
LinkedOmics GUI    
(Biological) Data structures
Name Language Active? Description
ExperimentHub R Yes A repository of curated biological data
dalmatian Python Yes a collection of high-level functions for interacting with Firecloud via Pandas dataframes
EUGENe Python Yes computational framework for machine learning based modeling of regulatory sequences
eugene.dataload.dataloaders.SeqData Python Yes SeqData object used to containerize and store data for EUGENe workflows
Hail Python Yes Cloud-native genomic dataframes and batch computing
hail.matrixtable.MatrixTable Python Yes A MatrixTable is a distributed two-dimensional extension of a Table
HTSeq 2.0 Python Yes HTSeq is a Python package for analysis of high-throughput sequencing data
Janggu Python Yes Janggu is a python package that facilitates deep learning in the context of genomics
janggu.data Python Yes genomics datasets maintains coverage or sequence type of information along with the associated genomic intervals. Externally, the datasets behave similar to a numpy array
kartothek Python Yes A consistent table management library in python / manage tabular data in object stores
MultiAssayExperiment R Yes A Bioconductor package for the representation of multi-assay experiments
OpenOmics python Yes A bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases
pyGeno Python Yes precision medicine and proteogenomics
scikit-allel Python No Succeeded by sgkit
scikit-bio Python Yes mainly microbial genomics
scikit-genome Python Yes add-on to CNVkit
scverse R, Python Yes AnnData, muon with PyTorch for single-cell RNA-seq
sgkit Python Yes xarray for VCFs with some statgen functionality (eg GWAS)
Modeling
Name Language Active? Description
kipoi/kipoiseq Python Yes Standard set of data-loaders for training and making predictions for DNA sequence-based models
kipoi/models Python Yes Model zoo for genomics
Hugging Face Python Yes Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets
SKOPS Python Yes a Python library helping you share your scikit-learn based models and put them in production. At the moment, it includes tools to easily integrate models on the Hugging Face Hub
EUGENe Python Yes computational framework for machine learning based modeling of regulatory sequences
Janggu Python Yes Janggu is a python package that facilitates deep learning in the context of genomics
BioSimulators Python, SBML Yes central registry of simulation engines and services for recommending specific tools. see also Benchmark-Models-PEtab
NCI Genetic Simulation Resources (GSR) Python, R, C++ Yes Database of genetic simulation software tools
Multi-omic data integration
Name Language Active? Description
GLUE (Graph-Linked Unified Embedding) Python Yes Graph-linked unified embedding for single-cell multi-omics data integration
MOFA R, Python Yes Multi-omic factor analysis
OmicsEV R Yes OmicsEV: A tool for large scale omics data tables evaluation
SCENIC R, Python Yes SCENIC Suite is a set of tools to study and decipher gene regulation. Its core is based on SCENIC (Single-Cell rEgulatory Network Inference and Clustering) which enables you to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data (using SCENIC) or the combination of single-cell RNA-seq and single-cell ATAC-seq data (using SCENIC+).
The Network Zoo R, Python, MATLAB, C Yes a network biology package for the inference and analysis of gene regulatory networks
Annotations
Name Language Active? Description
pypath / OmniPath Python, R Yes A Python module for molecular signaling prior knowledge processing
pyensembl Python Yes annotation
eDGAR Python Yes a database of Disease-Gene Associations
NDEx-The Network Data Exchange Web, API Yes The NDEx Project provides an open-source framework where scientists and organizations can store, share, manipulate, and publish biological network knowledge.
Other
Name Language Active? Description
PyBDA Python Yes A Python package for the analysis of biological data
PyBEL Python Yes A Python module for biological expression language
pycellbase Python Yes mainly microbial genomics
pygenometracks Python Yes  
skorch Python Yes A scikit-learn compatible neural network library that wraps PyTorch.
TorchData Python Yes A PyTorch repo for data loading and utilities

Development roadmap

see :ref:`roadmap`