mediawiki-utilities/python-mwchatter

Parse incoming text into wikicode only once

yuvipanda opened this issue · 4 comments

Looks like in section.py the entire wikicode is parsed to get list of sections (in def _generate_flat_list_of_sections(text):) and then again in _load_fields

It looks like #18 addresses everything raised here

The function to check indent levels also does a re-parse, which makes everythnig a lot slower. Can you re-open this and I can take a look at seeing if I can fix that too?

Sure, feel free to try.

I am starting to wonder if it would be better to remove the dependency on mwparserfromhell. We are only using mwparserfromhell for sections and checking for outdent. Sections can probably be determined using regexes to find headers. So far as I can tell outdents don't come in that many shades either and could probably be picked up pretty well by a regex also.

Using regexes on mw wikitext parsing is the start of a long road to madness
so I would highly recommend not doing it :) many have done this before and
regretted their decision later...
On Jan 13, 2016 7:08 AM, "Kevin Schiroo" notifications@github.com wrote:

Sure, feel free to try.

I am starting to wonder if it would be better to remove the dependency on
mwparserfromhell. We are only using mwparserfromhell for sections and
checking for outdent. Sections can probably be determined using regexes to
find headers. So far as I can tell outdents don't come in that many shades
either and could probably be picked up pretty well by a regex also.


Reply to this email directly or view it on GitHub
#17 (comment)
.