ncbi/datasets

Possible to query datasets by genome notes?

Closed this issue · 2 comments

Is your feature request related to a problem? Please describe.

Unable to query NCBI datasets by genome notes field. E.g. I would like a way to find all the genomes with the genome note derived from single cell data.

Describe the solution you'd like

A way to query by this field, which appears to be an enum?

Hi @michaelbarton,

Thanks for opening this issue.

I would like a way to find all the genomes with the genome note derived from single cell data.

Here's what I would recommend:

  1. Generate a table that includes two columns, assembly accession and genome notes
  2. Use grep to find rows containing derived from single cell

For example:

datasets summary genome taxon cyanobacteriota --as-json-lines | dataformat tsv genome --fields accession,assminfo-notes | grep -m5 "derived from single cell"
GCA_002017915.1	derived from single cell
GCA_002017955.1	derived from single cell
GCA_002018045.1	derived from single cell
GCA_003030805.1	contaminated,derived from single cell,fragmented assembly,genus undefined
GCA_028658425.1	derived from single cell

Please let me know if you have any questions.

Best,
Eric

Eric Cox, PhD [Contractor] (he/him/his)
NCBI Datasets
NIH/NLM/NCBI
eric.cox@nih.gov

Thanks, awesome. Thanks for being so responsive Eric. I will give that a try.