attardi/wikiextractor

ptwiki-latest error

iwmo opened this issue · 2 comments

iwmo commented

while trying to extract ptwiki-latest-pages-articles.xml.bz2 im getting following error:
python -m wikiextractor.WikiExtractor ptwiki-latest-pages-articles.xml.bz2
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/oriebirj/Desktop/lufz/wikiextractor/WikiExtractor.py", line 66, in
from .extract import Extractor, ignoreTag, define_template, acceptedNamespaces
File "/home/oriebirj/Desktop/lufz/wikiextractor/extract.py", line 382, in
ExtLinkBracketedRegex = re.compile(
^^^^^^^^^^^
File "/usr/lib/python3.11/re/init.py", line 227, in compile
return _compile(pattern, flags)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/init.py", line 294, in _compile
p = _compiler.compile(pattern, flags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_compiler.py", line 743, in compile
p = _parser.parse(p, flags)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 980, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 863, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 863, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/re/_parser.py", line 841, in _parse
raise source.error('global flags not at the start '
re.error: global flags not at the start of the expression at position 4

Not sure why this happens.
Any clue??

Thanks

I ran into this today on Python 3.11.2, and applying #182 locally seems to fix it.