gandersen101/spaczz

Fuzzy Match of Term Combinations

ronyarmon opened this issue · 3 comments

My task is querying medical texts for institute names using a rule as below:
[{'ENT_TYPE': 'institute_name'}, {'TEXT': 'Hospital'}]
The rule will extract the hospital name only if it is bound by the word 'Hospital', including for example "Mount Sinai Hospital" but excluding "Mount Sinai".
spaczz works great for single term or phrase but I did not see an option to build multi-words rules as in the rule above.
Can I use scpaczz to identify typos for this entity, for example, "Mount Sinai Mospital"?

Hi @ronyarmon, thank you for your interest in spaczz! I believe you are asking for matching control in spaczz similar to spaCy's Matcher (token level patterns that support multiple tokens and token attributes). If so, I am actively working on this feature as requested in issue #24.

Seeing that I have already started documenting my progress on that issue, I am going to mark this as a duplicate. If you feel that this request is significantly different and/or I am misunderstanding you please let me know. If you agree with my assessment, I will close this issue in a few days and use issue #24 to track progress on this feature.

Hope that helps!

Thanks for your reply, my question is indeed a duplicate, I'll follow issue #24 but will try to explore the Cython source as well.

Closing as duplicate. Please see issue #24 for updates/developments.