Mouse Spinal Cord Atlas

This repository contains the R code used to analyse the single-cell RNA-seq dataset shown in:

Delile, J., Rayon, T., Melchionda, M., Edwards, A., Briscoe, J., & Sagner, A. (2019). Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. Development, dev-173807. https://doi.org/10.1242/dev.173807

Data availability
Figure shortcuts
Analysis preliminaries
1. Load and hygienize dataset
2. Knowledge-based identification of all cell populations
3. Population size dynamics
4. Combinatorial DE tests
5. Neuronal populations clustering
6. Neurogenesis dynamics
7. Export annotations

Data availability

Those interested in processing the dataset independently should consider downloading:

the UMI raw count matrix as outputted by the 10X Genomics cellranger pipeline.
the Annotated cell meta-data indicating the cells’ sample times, replicate ids and types as determined by the following pipeline. “Type_step1” and “Type_step2” stands for the outcome of the 2-step partitioning algorithm (Section 2). “Neuron_subtypes” indicates the result of the per-neuronal-type subclustering (Section 5). “Pseudotime” contains the neurogenesis ordering coordinates (Section 6).

Figure shortcuts

Figure 1	Figure 2	Figure 3	Figure 4

B C D-E	A-D E	A B	A

Figure 5	Figure 6	Figure 7	Figure S1

A	A	A-B C D	A-C D E-F G

Figure S2	Figure S3	Figure S4	Figure S6

A	A	A-H	A B-D

Analysis Preliminaries

To reproduce the analysis, the files contained in R_files, input_files and dataset must to be downloaded from this repository (most conveniently using git clone). The UMI count matrix is zipped and must be uncompressed into the dataset folder.

unzip("./dataset/UMI_count.tsv.zip", exdir="./dataset/")

The Antler package as it was at the time of publication is required and can be installed with devtools.

devtools::install_github("juliendelile/Antler", ref = "Development2019")

library(Antler)

Most functions not provided by Antler are stored in MouseSpinalCordAtlas_tools.R

source('./R_files/MouseSpinalCordAtlas_tools.R')

The output path can be changed to any existing directory path

output_path = './output/'

1. Load and hygienize dataset

m = Antler$new(plot_folder=output_path, num_cores=4)

m$loadDataset(folderpath="./dataset/", phenoData_filename="phenoData.csv", assayData_filename="UMI_count.tsv")

Annotate gene names

m$setCurrentGeneNames(geneID_mapping_file=system.file("extdata", "Annotations/biomart_ensemblid_genename_mmusculus.csv", package="Antler"))

Display counts pre-QC

m$plotReadcountStats(data_status="Raw", by="timepoint", category="replicate_id", basename="preQC")

Remove cells having more than 6% of mitochondrial UMI counts

m$removeGenesFromRatio(
                candidate_genes=grep('^mt-', grep('^mt-', m$getGeneNames(), value=T), value=T),
                threshold = 0.06,
                )

Remove outliers genes and cells

m$removeOutliers( lowread_thres = -Inf,
                  highread_thres = Inf,
                  genesmin = 500,
                  cellmin = 3,
                  data_status = 'Raw')

Display counts post-QC

m$plotReadcountStats(data_status="Raw", by="timepoint", category="replicate_id", basename="postQC")

2. Knowledge-based identification of all cell populations

Cell identities are determined by associating each cell to the closest target population state defined by a list of known marker genes

cell_partition = doCellPartition(known_template_file="./input_files/partitioning_table.csv", readcounts=m$getReadcounts(data_status='Raw'))

pop_colors = getPopulationColors(known_template_file="./input_files/partitioning_table.csv")

for(md in c("Type_step1", "Type_step2", "Type_step2_unique", "DV")){pData(m$expressionSet)[[md]] <- cell_partition[[md]]}

Step 1 Map

cellcluster_sizes_cumsum_step1 = setNames(
          c(0, cumsum(table(pData(m$expressionSet)$Type_step1))),
          c(levels(pData(m$expressionSet)$Type_step1), ""))

m$readcounts_norm=m$readcounts_raw
m$readcounts_norm[m$readcounts_norm >= 2] <- 1 # threshold used in doCellPartition
pop_def_mask_step1 = markersToMask(cell_partition$step1_markers)
m$dR$genemodules = as.list(rownames(pop_def_mask_step1))

m$plotGeneModules(
                  basename='FIG1_Map_Step1',
                  displayed.gms = c('dR.genemodules'),
                  displayed.geneset=NA,
                  use.dendrogram=NA,
                  display.clusters=NULL,
                  file_settings=list(list(type='pdf', width=10, height=3)),
                  data_status=c('Normalized'),
                  gene_transformations='none',
                  display.legend=TRUE,
                  cell.ordering=order(pData(m$expressionSet)$Type_step1), # works iff Type_step1 are factors
                  extra_colors=cbind(
NA
                    "UMI counts"=colorRampPalette(c("white", "black"))(n = 101)[as.integer(1+100*(colSums(m$readcounts_raw) / max(colSums(m$readcounts_raw))))]
                    ),
                  extra_legend=list("text"=c("", levels(pData(m$expressionSet)$Type_step1)), "colors"=c('white', getClusterColors()[seq(length(unique(pData(m$expressionSet)$Type_step1)))])),
                  genemodules.palette=rep("white", length(m$dR$genemodules)),
                  rect_overlay=apply(which(pop_def_mask_step1==1, arr.ind=T), 1, function(x){
                                       list(
                                          xleft=cellcluster_sizes_cumsum_step1[[x[2]]],
                                          xright=cellcluster_sizes_cumsum_step1[[x[2]+1]],
                                          ytop=length(unlist(m$dR$genemodules)) - x[1] + 0.5,
                                          ybottom=length(unlist(m$dR$genemodules)) - x[1]+1 + 0.5
                                          )
                                  }),
                  pretty.params=list("size_factor"=3, "ngenes_per_lines" = 8, "side.height.fraction"=.3),
                  curr_plot_folder=output_path
                )

juliendelile/MouseSpinalCordAtlas

Mouse Spinal Cord Atlas

Data availability

Figure shortcuts

Analysis Preliminaries

1. Load and hygienize dataset

2. Knowledge-based identification of all cell populations

Step 1 Map

Step 2 Map

Doublet estimation

Progenitor and Neuron Maps

tSNE plots

3. Population size dynamics

Population ratio dynamics

Population size comparison

4. Combinatorial DE tests

Neuron patterning prediction

Progenitor patterning prediction

Gene categories highlights

Claudin 3

5. Neuronal populations clustering

Identify gene module in each domain

Partition neurons from curated gene modules

Gene levels per subtypes

Neurogenesis waves

6. Neurogenesis dynamics

Identify gliogenic and neurogenic pan-domain modules

Differentiation plane

Generate pseudotime profiles

Plot neurogenesis pattern

Genes profiles per domain

7. Export annotations