Money values are not identified correctly
AgoloRandaMoustafa opened this issue · 1 comments
AgoloRandaMoustafa commented
Sentence: I have 4 cents or 4$ now.
When trying to normalize the money value in the above sentence, the entity normalizer produces 44
, which is not accurate.
It seems that s = s.replaceAll("[A-Za-z]", "");
is causing this issue.
AngledLuffa commented
java edu.stanford.nlp.pipeline.StanfordCoreNLP
NLP> I have 4 cents or 4$ now.
Sentence #1 (9 tokens):
I have 4 cents or 4$ now.
Tokens:
[Text=I CharacterOffsetBegin=0 CharacterOffsetEnd=1 PartOfSpeech=PRP Lemma=I NamedEntityTag=O]
[Text=have CharacterOffsetBegin=2 CharacterOffsetEnd=6 PartOfSpeech=VBP Lemma=have NamedEntityTag=O]
[Text=4 CharacterOffsetBegin=7 CharacterOffsetEnd=8 PartOfSpeech=CD Lemma=4 NamedEntityTag=MONEY NormalizedNamedEntityTag=$0.04]
[Text=cents CharacterOffsetBegin=9 CharacterOffsetEnd=14 PartOfSpeech=NNS Lemma=cent NamedEntityTag=MONEY NormalizedNamedEntityTag=$0.04]
[Text=or CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=CC Lemma=or NamedEntityTag=O]
[Text=4 CharacterOffsetBegin=18 CharacterOffsetEnd=19 PartOfSpeech=CD Lemma=4 NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=$ CharacterOffsetBegin=19 CharacterOffsetEnd=20 PartOfSpeech=$ Lemma=$ NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=now CharacterOffsetBegin=21 CharacterOffsetEnd=24 PartOfSpeech=RB Lemma=now NamedEntityTag=MONEY NormalizedNamedEntityTag=$4.0]
[Text=. CharacterOffsetBegin=24 CharacterOffsetEnd=25 PartOfSpeech=. Lemma=. NamedEntityTag=O]
Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, have-2)
nsubj(have-2, I-1)
nummod(cents-4, 4-3)
obj(have-2, cents-4)
cc($-7, or-5)
nummod($-7, 4-6)
obj(have-2, $-7)
conj:or(cents-4, $-7)
advmod(have-2, now-8)
punct(have-2, .-9)
Extracted the following NER entity mentions:
4 cents MONEY MONEY:0.9931253379658741
4$ now MONEY MONEY:-1.0
What error are you seeing?