Question regarding Figshare data and AWS data

Question

Question regarding Figshare data and AWS data

hojaeklee opened this issue 6 years ago · 8 comments

Hello, I was wondering if someone may explain to me the differences in the FigShare data (I downloaded via data_download.sh) and AWS data?

From looking at annotation_FACS.csv and TM_facs_metadata.csv, there seems to be a difference in the total number of cells and the categories for cell_ontology_classes.

Perhaps someone could kindly point me to how these were annotated? Thank you very much. :)

Answer 1 · 2019-08-26T05:44:31.000Z

@olgabot can you help?

Answer 2 · 2019-09-17T23:51:01.000Z

Hi @hojaeklee, the raw AWS data includes ALL cells that may not have been filtered out.

(base)
 ♥ 71%  Wed 18 Sep - 01:45  ~/Downloads 
  wc -l TM_facs_metadata.csv
   53761 TM_facs_metadata.csv
(base)
 ♥ 71%  Wed 18 Sep - 01:45  ~/Downloads 
  wc -l annotations_facs.csv
   44950 annotations_facs.csv

So many cells in TM_facs_metadata.csv have NA in the column cell_ontology_class because they did not pass filters. annotations_FACS.csv were only from the cells with at least 500 genes and 50k reads, as in here per tissue, and all tissue annotations were combined here. Hope that helps!

Thank you for your patience and sorry for the delay!

Answer 3 · 2020-02-23T16:35:03.000Z

Hi @olgabot

We would like to access the plate data fastqs for the cells that did not pass filters. Where should we look? SRA?

Here are the locations we explored for the 3-months data on AWS:

FACS metadata folder: Taking kidney as an example, there are 865 cells but there is no location for the fastqs.

data=fread("./Downloads/tabula-muris-senis-facs-official-raw-obj__cell-metadata.csv")
data%>%subset(tissue=="Kidney")%>%subset(age=="3m")%>%summarize(n())
  n()
865

This number also fits with the Kidney-counts.csv file in the FACS.zip folder on figshare.

Plate-seq folder: In the fastqs annotation file (in the fastq folder), there are only 519 cells. There are fastqs for these.

data=fread("./Downloads/fastqs_annotated.csv")
data%>%subset(tissue=="Kidney")%>%summarize(n())
  n()
519

Data objects folder: the tabula-muris-senis-facs-official-raw-obj.h5ad file has 502 cells. These cells passed a filter of 500 genes.

We would like the fastq files for the (865-519) cells. And for all tissues and times. Many Thanks!

Answer 4 · 2020-02-24T05:49:56.000Z

@ayshwaryas all the fastqs are available from Tabula Muris Senis S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/.

The 3m (Tabula Muris) files are also available from the Tabula Muris S3 bucket: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris/

Answer 5 · 2020-02-24T17:22:42.000Z

Thanks @aopisco

I did look there and my note is based on the fastqs_annotated.csv in the Tabula Muris Senis S3 bucket
(https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/Plate_seq/3_month/?region=us-west-2&tab=overview)

Based on the numbers I posted, it seems the annotation file is not uptodate or only has filtered cells? Could you please help disambiguate? Is there another annotation file? Thanks!

Answer 6 · 2021-05-05T14:04:14.000Z

Do you provide a metadata file like the one requested by ayshwaryas?
I am in need of a droplet metadata file for this file:
tabula-muris-senis-bbknn-processed-official-annotations.h5ad
So metadata for all 356.213 cells. Can someone provide?

Answer 7 · 2021-05-08T01:20:02.000Z

@donshiva88 that object includes the metadata

Answer 8 · 2021-05-08T01:20:47.000Z

@ayshwaryas we only have annotation file for the good quality cells