Franck-Dernoncourt/NeuroNER

Validating I2B2 2014 performance

karthikmurugadoss opened this issue · 0 comments

Thank you for the amazing tool!

We are trying to validate the performance of NeuroNer on the 2014 I2B2 test dataset. However, we are unable to achieve the 97.7 F1-score as reported in Table 1 of the paper (https://www.aclweb.org/anthology/D17-2017.pdf). We are currently getting an F-score closer to 94% (precision at 97% and recall at 90%).

This is after exactly following instructions on the GitHub repository (using the pre-trained model) and computing precision, recall and F-score on a PHI vs non-PHI basis and excluding non-HIPAA identifiers (we had to manually remove DATE identifiers).

Am I missing something?