gambolputty/wikitable2csv

CSV output contains CSS code lines from style tag

ptrstn opened this issue · 1 comments

Hello,

I used your website and ran into a rather unexpected behavior.
I tried to parse the table at https://de.wikipedia.org/wiki/Liste_traditioneller_Radikale, which, for the most part, resulted in a great csv table.

Only the lines with the number 64 and 147 contained a (unwanted) .mw-parser-output .Hant{font-size:110%}:

Nr.,Zeichen (Varianten),Pīnyīn,Bedeutung und Anmerkungen,Häufig-keit,Kurz-zeichen,Beispiele
1,一,yī,eins,42,,七三不世
2,丨,gǔn,Vertikalstrich,21,,中
3,丶,zhǔ,Tropfstrich,10,,丸主
[...]
64,"手 (.mw-parser-output .Hans{font-size:110%}才,扌 links)",shǒu,"Hand, in der Hand halten",1.203,,手打持掛挙
[...]
147,.mw-parser-output .Hant{font-size:110%}見,jiàn,sehen,161,见[2],規親覺觀
[...]

When I inspected the source code of the wiki page, I saw that this text is indeed embedded in the html table itself (only for these two lines though):

<td>
   <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932623">
   <span lang="zh-Hani" class="Hani"></span> (
   <style data-mw-deduplicate="TemplateStyles:r184932629">.mw-parser-output .Hans{font-size:110%}</style>
   <span lang="zh-Hans" class="Hans"></span>,
   <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932623">
   <span lang="zh-Hani" class="Hani"></span> <small>links</small>)
</td>
<td>
   <style data-mw-deduplicate="TemplateStyles:r184932626">.mw-parser-output .Hant{font-size:110%}</style>
   <span lang="zh-Hant" class="Hant"></span>
</td>

Can the CSS code inside any <style></style> tag, or the style tag itself, be removed when generating the csv table?

Thanks!

I rewrote the app and this issue should now be resolved.