German Punkt: more time units in `##number## word` pattern
Opened this issue · 0 comments
adbar commented
Hi, in cases like "3. Minute" the sentence wrongly ends at "3." according to Punkt.
I see you have an effective list of words (notably months) in packages/tokenizers/punkt_tab/german/collocations.tab
but it is incomplete. It would be useful to add the following time expressions:
##number## sekunde
##number## minute
##number## stunde
##number## tag
##number## woche
##number## monat
##number## jahr
I'm not sure how to proceed, can I open a PR to change the file directly or are other steps involved?