ewels/sra-explorer

Not all samples found

olgabot opened this issue ยท 4 comments

For PRJNA143627, only 106/149 samples are retrieved. Here is a notebook comparing the SRA explorer vs the NCBI run table output: https://github.com/czbiohub/kh-analysis/blob/olgabot/brawand2011-extract-coding/notebooks/410_compare_sra_explorer_to_ncbi_run_selector.ipynb

Here are the files: (had to add .txt so that GitHub wouldn't complain)
brawand2011_metadata.csv.txt
sample_metadata_handmade.csv.txt

It doesn't seem like only the "First n" entries were found. It seems consistent and just reproduced it:

Screen Shot 2019-11-05 at 10 23 13 AM

ewels commented

For PRJNA143627, only 106/149 samples are retrieved.

Yup, results are limited to ~100 samples. If you tell it to start at record 100, it finds another 42:

image

If you set the max records to 500, it finds all 149:

image

TSV of full metadata: sra_explorer_metadata.tsv.txt


It doesn't seem like only the "First n" entries were found.

You mean it found 107 instead of 100? That will be because one or two entries will have multiple SRR entities. They don't always match up 1:1.

Does this solve your problem, or am I missing something here?

ohh okay. I was confused because 107 > 100 so it looked to me that it found ore than the limit of results, so I thought it was done! Maybe something to clear up in the documentation

ewels commented

PRs welcome! ๐Ÿ˜ I'm not totally clear on the what was confusing so would actually really appreciate any suggestions for what could be improved ๐Ÿ‘