TooManyHeadings error on otherwise legit-looking page
yuvipanda opened this issue · 12 comments
On https://en.wikipedia.org/wiki/Wikipedia:Teahouse/Questions/Archive_235#How_to_link_any_file_like_video_or_picture.2C_on_wikipedia.27s_article.3F I get a TooManyHeadings
error. Removing sections titled == How to link any file like video or picture, on wikipedia's article? ==
or any of the two ones succeeding it fixes this issue.
It looks like the wikicode section being received is not properly split?
A simple mwparserfromhell script seems to work fine:
from mwparserfromhell import parse
doc = parse(open('text.wm'), skip_style_tags=True)
for sec in doc.get_sections(include_lead=True, flat=True):
print(sec.filter_headings())
This is really strange! This script works for me:
from mwparserfromhell import parse
doc = parse(open('/path/to/file'), skip_style_tags=True)
for sec in doc.get_sections(include_lead=True, flat=True):
if len(sec.filter_headings()) > 1:
print("Bad!")
# never prints
however, this does not work:
from mwparserfromhell import parse
with open('/path/to/file') as f:
text = f.read()
doc = parse(text, skip_style_tags=True)
for sec in doc.get_sections(include_lead=True, flat=True):
if len(sec.filter_headings()) > 1:
print("Bad!")
# prints once
Using the file pointer rather than the text matters?
Yeah, I can repro that. When passed as a string, one of the sections has multiple headings ["== qHow to link any file like video or picture, on wikipedia's article? ==", '==going "live"==']
and when parsed as a file it doesn't.
I think this is an issue to raise with mwparserfromhell
, I would guess this is not their intended behavior.
It looks like the issue in mwparserfromhell
that is causing this is projected to be fixed in 0.5. In the meantime it will be possible to just cut sections on all headings which should take care of the issue.
I've hand-fixed that archive page for now.
Python ignoramus here; is this also the reason I'm getting a TooManyHeadings problem on https://en.wikipedia.org/wiki/Talk:Rhaetian_Railway ?
Yeah, there is a pretty good chance that this is the issue. I will look more into this specific case tomorrow. It is just the current revision that causes the issue, correct?
I mean, I haven't tried past revisions, so...
@Ironholds, I am not able to reproduce your issue with current version of that talk page. Could you post a file containing the input that throws the error?
Okay, found an example; https://www.dropbox.com/s/kq3sjxzyu7j3h4i/article_talk_15585692.wikitext?dl=0