spencermountain/wtf_wikipedia

smarter rendering of html Infoboxes

j-rausch opened this issue · 3 comments

Using the latest wtf_wikipedia with --html option generates HTML files which are missing the infoboxes.
For reference, see https://runkit.com/spencermountain/5b912901c133fe0012ebfc8f

hey @j-rausch thanks, i've added this back in 5.3.0,
it is pretty weird though, and I'd like to get your input on it.
wikipedia does A LOT of post-hoc rendering-stuff for their infoboxes, and I think you'll see that there's some work left to do.
i'll leave this open

Thanks! Yeah, wow, it does seem pretty convoluted indeed..
Looking at https://en.wikipedia.org/wiki/Abraham_Lincoln, and how the infobox headers are generated from the wikitext source (https://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=edit)

The header Abraham Lincoln is defined nowhere in the infobox mardown:

{{Infobox officeholder
| image = Abraham Lincoln O-77 matte collodion print.jpg
| alt = An iconic photograph of a bearded Abraham Lincoln showing his head and shoulders.
...

I suppose it comes from the default parameter for name in https://en.wikipedia.org/wiki/Template:Infobox_officeholder:

Default
    The pagename

, or Header: The Lincoln Cabinet
for the second infobox:

{{Infobox U.S. Cabinet
| Name = Lincoln
| President = Abraham Lincoln
...

where https://en.wikipedia.org/wiki/Template:Infobox_U.S._Cabinet defines the header as The {{{Name}}} Cabinet.
Is there some resource avilable to easily resolve all these templates, or would this need to be done more or less manually @spencermountain ?

yeah, i'm gonna take a slap at this today, but I could use some help.

wow, I didn't know it pulled information from the page. We are getting the page name now pretty-reliably from either the api, the dump-xml, or the first-sentence-bolding. I think we can incorporate it into the infobox reliably now.

yeah, lemme stew on this a little bit, then I'll set something up, and you can take it for a spin. It would be great if we could produce somewhat-good infoboxes as html