Adverse effect of processors v9 on eidos
Opened this issue · 5 comments
I'm not sure where (or if) these were recorded before. I'll try to get to the bottom of them here.
[info] *** 226 TESTS FAILED ***
[error] Failed tests:
[error] org.clulab.wm.eidos.text.english.raps.TestRaps
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc5
[error] org.clulab.wm.eidos.serialization.jsonld.TestJLDSerializer
[error] org.clulab.wm.eidos.text.english.raps.TestRaps1
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc8
[error] org.clulab.wm.eidos.text.englishGrounding.TestGrounding
[error] org.clulab.wm.eidos.text.english.cag.TestCagP1
[error] org.clulab.wm.eidos.text.english.cag.TestExtraText
[error] org.clulab.wm.eidos.serialization.TestDocSerialization
[error] org.clulab.wm.eidos.text.english.cag.TestCagP0
[error] org.clulab.wm.eidos.text.englishGrounding.TestSpecificGroundings
[error] org.clulab.wm.eidos.utils.TestLauncher
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc2
[error] org.clulab.wm.eidos.text.english.cag.TestCagP4
[error] org.clulab.wm.eidos.system.TestCrLf
[error] org.clulab.wm.eidos.serialization.jsonld.TestJLDDeserializer
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc3
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc6
[error] org.clulab.wm.eidos.text.english.cag.TestCagP3
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc1
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc4
[error] org.clulab.wm.eidos.text.englishGrounding.TestGrounderStability
[error] org.clulab.wm.eidos.system.TestEidosMention
[error] org.clulab.wm.eidos.text.english.cag.TestCagP2
[error] org.clulab.wm.eidos.document.TestSentenceClassifier
[error] org.clulab.wm.eidos.text.english.eval6.TestDoc7
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 2235 s (37:15), completed Feb 13, 2024 10:09:21 AM
sbt:eidos>
Thank you @kwalcock !!
@MihaiSurdeanu, TestJLDSerializer is failing because one date does not get turned into an attachment. This seems to be because an entity in a sentence is expected to be DATE in eidos and it was so using the old version of processors, but it is B-DATE in the new version. Does this ring any bells?
Ah, I see. This happens because we use the BIO notation for named and numeric entities, whereas CoreNLP does not.
This is a small change that does not matter, so I think we should adjust the unit tests!
It doesn't matter much, but the particular tests would be difficult to change. Instead, for now I've converted B-DATE and I-DATE to DATE and errors for two unit tests went away.
The next problem is that eidos is seeing empty strings for norms where earlier it had seen O. I'm patching that up as well. Is it an expected change?
No, that's another instance of me forgetting what I did before :)
I'm now thinking perhaps it's simpler to path things up directly in processors, to:
- Remove B- and I- from labels;
- Add "O" for empty norms.