Possible to query datasets by genome notes?
Closed this issue · 2 comments
michaelbarton commented
Is your feature request related to a problem? Please describe.
Unable to query NCBI datasets by genome notes field. E.g. I would like a way to find all the genomes with the genome note derived from single cell data
.
Describe the solution you'd like
A way to query by this field, which appears to be an enum?
ericcox1 commented
Hi @michaelbarton,
Thanks for opening this issue.
I would like a way to find all the genomes with the genome note derived from single cell data.
Here's what I would recommend:
- Generate a table that includes two columns, assembly accession and genome notes
- Use
grep
to find rows containingderived from single cell
For example:
datasets summary genome taxon cyanobacteriota --as-json-lines | dataformat tsv genome --fields accession,assminfo-notes | grep -m5 "derived from single cell"
GCA_002017915.1 derived from single cell
GCA_002017955.1 derived from single cell
GCA_002018045.1 derived from single cell
GCA_003030805.1 contaminated,derived from single cell,fragmented assembly,genus undefined
GCA_028658425.1 derived from single cell
Please let me know if you have any questions.
Best,
Eric
Eric Cox, PhD [Contractor] (he/him/his)
NCBI Datasets
NIH/NLM/NCBI
eric.cox@nih.gov
michaelbarton commented
Thanks, awesome. Thanks for being so responsive Eric. I will give that a try.