nltk/nltk_data

German Punkt: more time units in `##number## word` pattern

Opened this issue · 0 comments

Hi, in cases like "3. Minute" the sentence wrongly ends at "3." according to Punkt.

I see you have an effective list of words (notably months) in packages/tokenizers/punkt_tab/german/collocations.tab but it is incomplete. It would be useful to add the following time expressions:

##number## sekunde
##number## minute
##number## stunde
##number## tag
##number## woche
##number## monat
##number## jahr

I'm not sure how to proceed, can I open a PR to change the file directly or are other steps involved?