Curate remaining entries to the Bioregistry
cthoyt opened this issue · 7 comments
After lots of careful curation, there are only four resources listed in this repository that I can't quite figure out
datasource_name | system_code | website_url | linkout_pattern | example_identifier | entity_identified | single_species | identifier_type | uri | regex | official_name | wikidata_property | bioregistry |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Gramene Arabidopsis | EnAt | http://www.gramene.org/ | http://www.gramene.org/Arabidopsis_thaliana/Gene/Summary?g=$id | ATMG01360-TAIR-G | gene | Arabidopsis thaliana | 1 | EnAt | AT[\dCM]G\d{5}-TAIR-G | Gramene Arabidopsis | nan | nan |
Gramene Maize | EnZm | http://www.ensembl.org | http://www.maizesequence.org/Zea_mays/Gene/Summary?g=$id | GRMZM2G174107 | gene | nan | 1 | EnZm | nan | Gramene Maize | nan | nan |
Gramene Rice | EnOj | http://www.gramene.org/ | http://www.gramene.org/Oryza_sativa/Gene/Summary?db=core;g=$id | osa-MIR171a | gene | nan | 1 | EnOj | nan | Gramene Rice | nan | nan |
Rice Ensembl Gene | Os | http://www.gramene.org/Oryza_sativa | http://www.gramene.org/Oryza_sativa/geneview?gene=$id | LOC_Os04g54800 | gene | Oryza sativa | 1 | Os | nan | Rice Ensembl Gene | nan | nan |
Example URLs:
- http://www.gramene.org/Arabidopsis_thaliana/Gene/Summary?g=ATMG01360-TAIR-G (works, but should just be
ATMG01360
) - http://www.maizesequence.org/Zea_mays/Gene/Summary?g=GRMZM2G174107 (redirects to https://ensembl.gramene.org/Zea_mays/Gene/Summary?g=GRMZM2G174107)
- http://www.gramene.org/Oryza_sativa/Gene/Summary?db=core;g=osa-MIR171a (dead)
- http://www.gramene.org/Oryza_sativa/geneview?gene=LOC_Os04g54800 (dead)
So the question is for the first two, what should we call these in Bioregistry? should they really get their own prefixes or is there a more general Gramene resolver for all of these IDs?
For the last two, can these be fixed? Maybe just need a new example from the same pattern.
And @tabbassidaloii : could you check if the databases above are part of our new BridgeDb mapping files?
We have mapping files for Arabidopsis thaliana (At), Zea mays (Zm), Oryza sativa japonica (Oj), and Oryza sativa indica (Oi).
We have mapping files for Arabidopsis thaliana (At), Zea mays (Zm), Oryza sativa japonica (Oj), and Oryza sativa indica (Oi).
@tabbassidaloii, but do we have mappings on those to Gramene?
FWIW I think these pathways were created by the Gramene team at the time.
Gramene
@egonw No, we don't. Not sure if BioMart provides it.