Bio-NER for Dug on HEAL
Closed this issue · 1 comments
For discussion:
Most HEAL data is not extensively curated. Many are like those in NIDA-Share, providing narrative prose documents like the ones in the "Study Documents" box here.
Can we index these documents lexically (regular Elastic) and semantically (biomedical NER)?
We would like to do this in order to then align identifiers discovered via NER with HEAL CDE linkages.
The protocol PDF at the link above, for example, is brimming with biomedical terms we could use for indexing.
While there are plugins for indexing PDF in Elastic, we need access to the text to map ontology terms so I'm guessing the plugins won't do what we need. Is that correct, or do they provide hooks for us to use the text in arbitrary ways?
(We're also not ready to do this with a lot of text until RENCI has an in house bio-NER)
If the plugins don't provide hooks, what are our best options for a design to index documents like this lexically and semantically (i.e. NER)?
View historical comments on this issue https://github.com/helxplatform/development/issues/804