CSV contains citation link text
ptrstn opened this issue · 1 comments
Hello,
when the wiki table contains a citation (e.g. [2]
), the generated csv will interpret it as pure text. This is probably not desired.
Example: https://de.wikipedia.org/wiki/Liste_traditioneller_Radikale#Tabelle_der_Radikale
Output:
Nr.,Zeichen (Varianten),Pīnyīn,Bedeutung und Anmerkungen,Häufig-keit,Kurz-zeichen,Beispiele
147,.mw-parser-output .Hant{font-size:110%}見,jiàn,sehen,161,见[2],規親覺觀
148,角,jiǎo,"Horn, Ecke",158,,觚解觕觥觸
149,言 (訁 links),yán,"sprechen, Wort",861,讠[2]links,誁詋詔評詗詥試詧
(The [2]
is the undesired text, because it is useless by itself)
The HTML responsible for this is:
<td>
<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r184932629">
<span lang="zh-Hans" class="Hans">见</span>
<sup id="cite_ref-s_2-1" class="reference">
<a href="#cite_note-s-2">[2]</a>
</sup>
</td>
Can the citation links (hyperlinks with square brackets) be removed when generating the csv?
So basically all the <a>
tags that are surrounded by a <sup>
tag with class="reference"
.
Hey, I rewrote the app. There's an option now to exclude elements by class name from parsing, and it's set to “reference” by default to exclude those links (the list can be extended by adding more class names, separated by a comma). This fixes the issue.