aertslab/SCope

Matching clusters to cell-types

Closed this issue · 3 comments

Hello,

Excellent resource! Using this data in an inter-species comparison project.
Some issues I've encountered along the way.

Row names are missing from the clustering data.frames of some .loom files.

I downloaded the Aerts_Fly_AdultBrain_Filtered_57k.loom file from the SCope website, and tried to extract the clusters, but encountered the following error:

loom_file <- "raw_data/fly/Aerts_Fly_AdultBrain_Filtered_57k.loom"
loom <- SCopeLoomR::open_loom(file.path = loom_file,
                              mode = "r+")
clusters <- SCopeLoomR::get_clusterings_with_name(loom)
----
Error in get_global_meta_data(loom = loom)$clusterings[[i]] : 
  subscript out of bounds

I believe this is because the clusters in this .loom file is missing the row names (cell IDs) (get_clusterings() is able to get the clusters data.frame ok, just without row names). This is an issue because the cluster data.frame rows are in a different order than the cell metadata rows extracted with SCopeLoomR::get_cell_annotation(). I was able to confirm this with the Aerts_Fly_AdultBrain_Unfiltered_157k.loom file, which does have row names included in its clustering data.frame.

So you have to merge on the cell IDs, rather than just using cbind() (which just scrambles all the cell-type annotations). Could you add the cell IDs as row names to allow proper matching?

Annotating clusters with cell-types

For the aforementioned Aerts_Fly_AdultBrain_Unfiltered_* .loom files (and perhaps others) would it be possible to include the cell-types identified in the original publication directly within the metadata? Or at least provide machine-readable files associated with each .loom file that one could use to do this cell-type annotation?

I was able to get some of the cluster cell-types from the supp materials in the Davie et. al. 2018, but I'm still unsure if I'm matching it with the correct combination of version (Unfiltered_157k vs. Filtered_57k), dimensionality reduction method (Seurat t-SNE vs. SCENIC), and clustering resolution.

Many thanks in advance!,
Brian Schilder
Imperial College London

Hi @bschilder ,

Thanks for reaching out and showing interest for SCope and the fly brain ageing single-cell dataset.

For the missing row names, I'll check this out. Since this issue is related to the SCopeLoomR, I moving this to its github repository: aertslab/SCopeLoomR#28.

Regarding your second point about the cell-type information, there is a cell-based file available on GEO containing this metadata: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE107451 > GSE107451_DGRP-551_w1118_WholeBrain_57k_Metadata.tsv.gz > annotation column.

Hi @bschilder,
Version v0.10.2 of SCopeLoomR should fix the issues your were having

HI @dweemx , I've just realized i all this time later I never replied to this. These annotations worked perfectly. Thank you so much for helping provide these!