iesl/dilated-cnn-ner

MatSci domain word shapes

eddotman opened this issue · 1 comments

So looking at the word shape implementation here: https://github.com/iesl/dilated-cnn-ner/blob/master/src/preprocess.py#L64-L90

I think it makes sense to incorporate numbers: https://github.com/olivettigroup/synthesis-database/blob/master/synthesisdatabase/classifiers/token_classifier.py#L144-L159

Haven't tested rigorously between these (and related) methods but intuitively speaking, adding numbers into the word shape should help a lot with chemical formulas.

Totally agree. Will do.