/scDIOR

scDIOR: Single cell data IO softwaRe

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

scDIOR

scDIOR: Single cell RNA-seq Data IO softwaRe

Github star DOI dior version diopy version Downloads

Directory



Overviewtop

scDIOR software was developed for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). There is a data IO ecosystem composed of two modules, dior and diopy, between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy).

scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface.

overview



Installing scDIORtop

Users install and operate scDIOR following two ways:

  1. The environment is created by conda create in which scDIOR is installed.
  2. Docker images are available on the jiekailab/scdior-image.

1. Conda environment

The environment is created by conda create in which dior and diopy are installed.

conda create -n conda_env python=3.8 R=4.0
  1. R installation:
# for R
install.packages('devtools')
devtools::install_github('JiekaiLab/dior')
# or devtools::install_github('JiekaiLab/dior@HEAD')
  1. Python installation:
# for python
pip install diopy

2. Docker image

It is recommend to perform scDIOR in docker image, which ensures that the operating environment remains stable. scDIOR image is available on the jiekailab/scdior-image.

Brief description

  1. We first built the basic jupyter image which based on jupyter/base-notebook (jupyter managing Python and R) and fixuid (fixing user/group mapping issues in containers). This basic image is on jiekailab/scdior-image:base-jupyter-notebook1.0.
  2. Based on our customized basic image, we built scDIOR image again by Dockerfile. For the content of Dockerfile, it is at this link.

The current latest image contains the following main analysis platforms and software:

R version Python version
R 4.0.5 Python 3.8.8
Seurat 4.0.2 Scanpy 1.8.1
SingleCellExperiment 1.12.0 scvelo 0.2.3
monocle3 1.0.0 anndata 0.7.6
dior 0.1.5 diopy 0.5.2

Version control

At present, scDIOR is widely compatible with Seurat (v3~v4) and Scanpy (1.4~1.8) in different docker image. We configured multiple version docker image (https://hub.docker.com/repository/docker/jiekailab/scdior-image) to confirm that scDIOR can work well between multiple versions of Scanpy and Seurat.

Demo link

Platform Software Version data IO
R Seurat v3~v4 ☑️
Python Scanpy v1.4~v1.8 ☑️


scDIOR demotop

Here, we list several demos to show the powerful performance of scDIOR.


1. Single-cell data from R to Python

Users can perform trajectory analysis using Monocle3 in R, then transform the single-cell data to Scanpy in Python using scDIOR, such as expression profiles of spliced and unspliced, as well as cell layout. The expression profile can be used to run dynamical RNA velocity analysis and results can be projected on the layout of Monocle3.

Code

# in R
dior::write_h5(data, file='scdata.h5' object.type = 'singlecellexperiment')
# in Python
adata = diopy.input.read_h5(file = 'scdata.h5')

Demo link

1.trajectory_inference


2. Single-cell data from Python to R

Users can employ single-cell data processes and normalization method provided by Scanpy, and utilize batches correction method provided by Seurat.

Code

# in python
diopy.output.write_h5(data_py, file = 'scdata.h5')
# in R
adata = dior::read_h5(file='scdata.h5', target.object = 'seurat')

Demo link

batch_correct


3. Data IO for spatial omics data

scDIOR supports spatial omics data IO between R and Python platforms.

Code

# in R
dior::write_h5(data, file='scdata.h5', object.type = 'singlecellexperiment')
# in Python
adata = diopy.input.read_h5(file = 'scdata.h5')

Demo link

sptail_summary


4. Extended function

  1. The function to load ‘.rds’ file in Python directly;

    Code

    # in python
    adata = diopy.input.read_rds(file = './adata_R.rds',
                                 object_type='seurat',
                                 assay_name='RNA')
  2. The function to load ‘.h5ad’ file in R directly;

    Code

    # in R
    adata_seurat = read_h5ad(file = './adata_Python.h5ad', 
                      target.object = 'seurat', 
                      assay_name = 'RNA')
  3. Command line

    Description

    ScDIOR uses the command line to convert different data by calling scdior.

    usage: scdior [-h] -i INPUT -o OUTPUT -t TARGET -a ASSAY_NAME
    

    -i,--input The existing filename for different platforms, such as rds (R) or h5ad (Python).

    -o,--output The filename that needs to be converted, such as from rds to h5ad or from h5ad to rds.

    -t,--target The target object for R, such as seruat or singlecellexperiment.

    -a,--assay_name The primary data types, such as scRNA data or spatial data.

    Code

    $ scdior -i ./adata_test.h5ad -o ./adata_test.rds -t seurat -a RNA

Demo link

extend_function



Reference websites top

  1. Our article: https://doi.org/10.1186/s12859-021-04528-3
  2. jupyter docker stacks:
    1. https://github.com/jupyter/docker-stacks
    2. https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html
  3. fixuid: https://github.com/boxboat/fixuid
  4. Seurat: https://satijalab.org/seurat/index.html
  5. monocle3: https://cole-trapnell-lab.github.io/monocle3/
  6. Scanpy: https://scanpy.readthedocs.io/en/stable/index.html
  7. Scvelo: https://scanpy.readthedocs.io/en/stable/index.html