Question regarding Figshare data and AWS data
hojaeklee opened this issue · 8 comments
Hello, I was wondering if someone may explain to me the differences in the FigShare data (I downloaded via data_download.sh) and AWS data?
From looking at annotation_FACS.csv
and TM_facs_metadata.csv
, there seems to be a difference in the total number of cells and the categories for cell_ontology_classes.
Perhaps someone could kindly point me to how these were annotated? Thank you very much. :)
Hi @hojaeklee, the raw AWS data includes ALL cells that may not have been filtered out.
(base)
♥ 71% Wed 18 Sep - 01:45 ~/Downloads
wc -l TM_facs_metadata.csv
53761 TM_facs_metadata.csv
(base)
♥ 71% Wed 18 Sep - 01:45 ~/Downloads
wc -l annotations_facs.csv
44950 annotations_facs.csv
So many cells in TM_facs_metadata.csv
have NA in the column cell_ontology_class
because they did not pass filters. annotations_FACS.csv
were only from the cells with at least 500 genes and 50k reads, as in here per tissue, and all tissue annotations were combined here. Hope that helps!
Thank you for your patience and sorry for the delay!
Hi @olgabot
We would like to access the plate data fastqs for the cells that did not pass filters. Where should we look? SRA?
Here are the locations we explored for the 3-months data on AWS:
- FACS metadata folder: Taking kidney as an example, there are 865 cells but there is no location for the fastqs.
data=fread("./Downloads/tabula-muris-senis-facs-official-raw-obj__cell-metadata.csv")
data%>%subset(tissue=="Kidney")%>%subset(age=="3m")%>%summarize(n())
n()
865
This number also fits with the Kidney-counts.csv
file in the FACS.zip folder on figshare.
- Plate-seq folder: In the fastqs annotation file (in the fastq folder), there are only 519 cells. There are fastqs for these.
data=fread("./Downloads/fastqs_annotated.csv")
data%>%subset(tissue=="Kidney")%>%summarize(n())
n()
519
- Data objects folder: the
tabula-muris-senis-facs-official-raw-obj.h5ad
file has 502 cells. These cells passed a filter of 500 genes.
We would like the fastq files for the (865-519) cells. And for all tissues and times. Many Thanks!
@ayshwaryas all the fastqs are available from Tabula Muris Senis S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/.
The 3m (Tabula Muris) files are also available from the Tabula Muris S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris/
Thanks @aopisco
I did look there and my note is based on the fastqs_annotated.csv in the Tabula Muris Senis S3 bucket
(https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/Plate_seq/3_month/?region=us-west-2&tab=overview)
Based on the numbers I posted, it seems the annotation file is not uptodate or only has filtered cells? Could you please help disambiguate? Is there another annotation file? Thanks!
Do you provide a metadata file like the one requested by ayshwaryas?
I am in need of a droplet metadata file for this file:
tabula-muris-senis-bbknn-processed-official-annotations.h5ad
So metadata for all 356.213 cells. Can someone provide?
@donshiva88 that object includes the metadata
@ayshwaryas we only have annotation file for the good quality cells