allo-media/text2num

[PT-BR] Some numbers are not being recognize

Closed this issue · 2 comments

Hi there! Before all, great work!!! This lib helps a lot <3

As it happens with Spanish, in "pt" there some numerals which the alpha2digit function is not recognizing. Here is some examples:

_text = "dezenove"
alpha2digit(_text, "pt")
expected: 19
return: "dezenove"

To reproduce just create a env from zero and install text2num==2.4.0.

Until now I found those numbers:

  • "dezenove" (19)
  • "dezessete" (17)
  • "dezesseis" (16)
  • "um" (1) (but in this case the number need to be in a phrase, like "eu tenho um bis" -> "I have one bis").

In the case of "um" I see this issue for "ones" problems, but in Portuguese I don't think this happens...

Some prints to exemplify better:
image
image

@RafaelMRazeira, support to 19, 17 and 16 was added in #73. These modifications were not released in PyPi yet.

You can install from upstream till then:

$ pip install -U --force-reinstall https://github.com/allo-media/text2num.git

@rtxm I would also love to have the newer improvements from upstream in a release.

Regarding parsing "um" (1), I'd argue that Portuguese suffers from same ambiguity then English/French (#42). Take as an example this sentence: "tome como um exemplo essa sentença".

rtxm commented

2.5.0 Released!