MatSci domain word shapes
eddotman opened this issue · 1 comments
eddotman commented
So looking at the word shape implementation here: https://github.com/iesl/dilated-cnn-ner/blob/master/src/preprocess.py#L64-L90
I think it makes sense to incorporate numbers: https://github.com/olivettigroup/synthesis-database/blob/master/synthesisdatabase/classifiers/token_classifier.py#L144-L159
Haven't tested rigorously between these (and related) methods but intuitively speaking, adding numbers into the word shape should help a lot with chemical formulas.
strubell commented
Totally agree. Will do.