Checking newlines in regex expressions
evolutionoftheuniverse opened this issue · 2 comments
evolutionoftheuniverse commented
During #13 I see that in extractors.py some newlines (for code readability) in regex expressions are problematic. Better to check and test those.
evolutionoftheuniverse commented
The lines I mention are
AMERICAN_ENGLISH = re.compile(r'''(January|February|March|April|May|June|July|
August|September|October|November|December|Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|
Nov|Dec|Januar|Jänner|Februar|Feber|März|April|Mai|Juni|Juli|August|September|
Oktober|November|Dezember|Ocak|Şubat|Mart|Nisan|Mayıs|Haziran|Temmuz|Ağustos|
Eylül|Ekim|Kasım|Aralık|Oca|Şub|Mar|Nis|May|Haz|Tem|Ağu|Eyl|
Eki|Kas|Ara) ([0-9]{1,2})(st|nd|rd|th)?,? ([0-9]{4})''')
BRITISH_ENGLISH = re.compile(r'''([0-9]{1,2})(st|nd|rd|th)? (of )?(January|
February|March|April|May|June|July|August|September|October|November|December|
Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|Dec|Januar|Jänner|Februar|Feber|
März|April|Mai|Juni|Juli|August|September|Oktober|November|
Dezember|Ocak|Şubat|Mart|Nisan|Mayıs|Haziran|Temmuz|Ağustos|
Eylül|Ekim|Kasım|Aralık|Oca|Şub|Mar|Nis|May|Haz|Tem|Ağu|Eyl|
Eki|Kas|Ara),? ([0-9]{4})''')
GENERAL_TEXTSEARCH = re.compile(r'''January|February|March|April|May|June|July|
August|September|October|November|December|Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|
Nov|Dec|Januar|Jänner|Februar|Feber|März|April|Mai|Juni|Juli|August|September|
Oktober|November|Dezember|Ocak|Şubat|Mart|Nisan|Mayıs|Haziran|Temmuz|Ağustos|
Eylül|Ekim|Kasım|Aralık|Oca|Şub|Mar|Nis|May|Haz|Tem|Ağu|Eyl|Eki|Kas|Ara''')
adbar commented
Code refactored and checked, everything ok.