stanfordnlp/CoreNLP

Money values are not identified correctly

AgoloRandaMoustafa opened this issue · 1 comments

s = s.replaceAll("[A-Za-z]", "");

Sentence: I have 4 cents or 4$ now.
When trying to normalize the money value in the above sentence, the entity normalizer produces 44, which is not accurate.
It seems that s = s.replaceAll("[A-Za-z]", ""); is causing this issue.

java edu.stanford.nlp.pipeline.StanfordCoreNLP
NLP> I have 4 cents or 4$ now.

Sentence #1 (9 tokens):
I have 4 cents or 4$ now.

Tokens:
[Text=I CharacterOffsetBegin=0 CharacterOffsetEnd=1 PartOfSpeech=PRP Lemma=I NamedEntityTag=O]
[Text=have CharacterOffsetBegin=2 CharacterOffsetEnd=6 PartOfSpeech=VBP Lemma=have NamedEntityTag=O]
[Text=4 CharacterOffsetBegin=7 CharacterOffsetEnd=8 PartOfSpeech=CD Lemma=4 NamedEntityTag=MONEY NormalizedNamedEntityTag=$0.04]
[Text=cents CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=NNS Lemma=cent NamedEntityTag=MONEY NormalizedNamedEntityTag=$0.04]
[Text=or CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=CC Lemma=or NamedEntityTag=O]
[Text=4 CharacterOffsetBegin=18 CharacterOffsetEnd=19 PartOfSpeech=CD Lemma=4 NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=$ CharacterOffsetBegin=19 CharacterOffsetEnd=20 PartOfSpeech=$ Lemma=$ NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=now CharacterOffsetBegin=21 CharacterOffsetEnd=24 PartOfSpeech=RB Lemma=now NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=. CharacterOffsetBegin=24 CharacterOffsetEnd=25 PartOfSpeech=. Lemma=. NamedEntityTag=O]

Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, have-2)
nsubj(have-2, I-1)
nummod(cents-4, 4-3)
obj(have-2, cents-4)
cc($-7, or-5)
nummod($-7, 4-6)
obj(have-2, $-7)
conj:or(cents-4, $-7)
advmod(have-2, now-8)
punct(have-2, .-9)

Extracted the following NER entity mentions:
4 cents MONEY   MONEY:0.9931253379658741
4$ now  MONEY   MONEY:-1.0

What error are you seeing?