๐ Documentation: Dictionary.xml and DictionaryDescription.md of: eoActivityAgents
EmanuelFaria opened this issue ยท 0 comments
I've assembled a list of 8850 "activity agents" (I don't know what else to call them), that will need to normalized against either Wikidata or perhaps chebi.
I created this list by doing a GREP query on the almost 250,000 articles I pulled down with GetPapers last year. The query included up to four words before the term "agent" or "agents". Then with a LOT of cleaning, I trimmed the leading words and got this list down from about 50,000 to its present state (there were a lot of duplicates).
All the articles I pulled all had to do with various terms describing for the two main themes: Plant Extracts (or essential oils, etc.) AND Activities (medicinal, pharmacological, phyto-medicinal, etc.) NOT (petrol, shale, "oil", ... nothing "animal feed-related") etc.,
I ran the cleanest getpapers queries I could. Overall, there are very few terms that are out of the ballpark. Some of them have to do with what I consider "formulation" terms, (excipients, adhesives, abrasives, etc..,) but for the most part, these would be useful for any biomedical project, including Covid.
I did most of the work months ago, as a way to see what the literature had in it, and flex my growing GREP skills. But I pulled it out a couple of days ago and decided to do a bunch of find (junk/stop words) and replace them with , and it came out really nice. I wish I could have kept the discarded words separately (turns out scientists use a lot of puffery in their descriptions, just like marketers do!), but I couldn't think of a way of doing that that would have been practical or efficient.
Anyhow, it's still useful even without further disambiguation or descriptions, but adding those would definitely make it more useful โ especially, if we could split them up into different dictionaries, for example, having to do with different pathways. But that's different kettle of fish.
EDIT: Also, I ran some random tests by pulling out multi-word terms I'd never heard of, and putting them โ in quotes โ in EUPMC searches, and all of them had a decent number of hits.
EDIT 2: Plus, I never would otherwise have found so many different ways (synonym terms) to find things I'm actually interested in. For example:
- anti-oxidant agent
- anti-oxidants agent
- anti-oxidation agent
- anti-oxidative agent
- anti-oxidative protecting agent
- anti-oxidative stress agent
- anti-oxidizing agent
- anti-oxygenic agent
Who knew? ๐๐