Clean up code sample generated by MediaWiki SyntaxHighlight GeSHi extension to get raw code
renoirb opened this issue · 5 comments
Code sample in html output when using <syntaxHighlight>
in wiki content;
<syntaxHighlight>
<div class="container">
<div class="box bottom">This box is at the bottom with z-index set to auto.</div>
<div class="box middle">This box is in the middle with z-index set to auto.</div>
<div class="box top">This box is at the top with z-index set to auto.</div>
</div>
</syntaxHighlight>
Becomes this in the generated HTML;
<div>
<p><span class="language">HTML</span>
</p>
<pre>
<div dir="ltr" class="mw-geshi mw-code mw-content-ltr"><div class="html5 source-html5"><pre class="de1"><span class="sc2"><<span class="kw
2">div</span> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">"container"</span>></span>
<span class="sc2"><<span class="kw2">div</span> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">"box to
p"</span>></span>This box is at the top with z-index set to 30.<span class="sc2"><<span class="sy0">/</span><span class="kw2">div</span>></span>
<span class="sc2"><<span class="kw2">div</span> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">"box middle-level-one"</span>></span>This box is in the middle level 1 with z-index set to 20.<span class="sc2"><<span class="sy0">/</span><span class="kw2">div</span>></span>
<span class="sc2"><<span class="kw2">div</span> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">"box middle-level-two"</span>></span>This box is in at middle level 2 with z-index set to 20.<span class="sc2"><<span class="sy0">/</span><span class="kw2">div</span>></span>
<span class="sc2"><<span class="kw2">div</span> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">"box bottom"</span>></span>This box is at the bottom with z-index set to 10.<span class="sc2"><<span class="sy0">/</span><span class="kw2">div</span>></span>
<span class="sc2"><<span class="sy0">/</span><span class="kw2">div</span>></span></pre></div></div>
Which makes it hard to work with code samples within a static site.
The desired output for static site generator, so we can use a syntax highlighter out of the box, is:
<pre class="language-html5" data-lang="html5">
<div class="container">
<div class="box bottom">This box is at the bottom with z-index set to auto.</div>
<div class="box middle">This box is in the middle with z-index set to auto.</div>
<div class="box top">This box is at the top with z-index set to auto.</div>
</div>
</pre>
Solution path: change MediaWiki GeSHi SyntaxHighlight extension with this patch;
From c602156d811f714631670a6a45a66e3848716571 Mon Sep 17 00:00:00 2001
From: Renoir Boulanger <renoir@w3.org>
Date: Fri, 7 Aug 2015 21:04:46 -0400
Subject: [PATCH] Superseed GeSHi to return same as what the rest does
---
mediawiki/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mediawiki/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php b/mediawiki/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php
index ddaea80..a3589d9 100644
--- a/mediawiki/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php
+++ b/mediawiki/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php
@@ -57,6 +57,9 @@ class SyntaxHighlight_GeSHi {
}
}
$lang = strtolower( $lang );
+
+ return sprintf("\n<pre class=\"language-%s\" data-lang=\"%s\">\n%s\n</pre>\n", $lang, $lang, $text);
+
if( !preg_match( '/^[a-z_0-9-]*$/', $lang ) ) {
$error = self::formatLanguageError( $text );
return $error;
--
Search and find pages from exported content that has mw-geshi
in the exported HTML to create a data/missed.yml
file for another mediawiki:run 3
pass.
grep -rli mw-geshi out/content > data/missed-geshi.yml
in vim, I then sort and format them ;
:sort
:%s/\.md$//g
:%s/\/index$//g
:%s/^/ - /g
Then prepend at the beginning of the file:
missed:
End result looks like;
missed:
- apis/media_source_extensions/MediaSource/addSourceBuffer
- apis/media_source_extensions/MediaSource/appendBuffer
- apis/vibration
- WPD/Annotations
- WPD/Browser_Testing/QuirksMode
Notice that the previous grep command crawls everything in out/content/
is by default considered as what is commited in webplatform/docs repository.
If you cloned webplatform/docs-meta in out/content/Meta/
and webplatform/docs-wpd in out/content/WPD/
, you’ll see them too.
REMEMBER that mediawiki:run
writes into out/
regardless of what it has. If you want to re run also for what’s docs-meta, or docs-wpd you’ll have to make sure you have the right export data first. Then specify it in the mediawiki:run
command.
For example;
mv out out-main
mv out-main/WPD out
app/console mediawiki:run 3 --missed --xml-source=dumps/wpd.xml
If you need to ensure MediaWiki gives you out the most recent code, you can send a purge by using mediawiki:refresh-pages
. Notice that this command don’t impact the content of the out/
folder.
You can send refresh without worries by doing
app/console mediawiki:refresh-pages --xml-source=dumps/wpd.xml
Issue here isn't limited to syntaxHighlight and Syntax_GeSHI. Any code sample may break during import. Work has to be done during conversion pass to ensure its encoded into htmlentities up until pandoc
does the conversion.
Updated patch made to SyntaxHighlight_GeSHi.class.php
From e7c5677ca0d78601573990ee4c6fbcb734bbc645 Mon Sep 17 00:00:00 2001
From: Renoir Boulanger <hello@renoirboulanger.com>
Date: Fri, 4 Sep 2015 19:17:14 -0400
Subject: [PATCH] webplatform/mediawiki-conversion#19 patch
---
SyntaxHighlight_GeSHi.class.php | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/SyntaxHighlight_GeSHi.class.php b/SyntaxHighlight_GeSHi.class.php
index ddaea80..d6462f2 100644
--- a/SyntaxHighlight_GeSHi.class.php
+++ b/SyntaxHighlight_GeSHi.class.php
@@ -57,6 +57,12 @@ class SyntaxHighlight_GeSHi {
}
}
$lang = strtolower( $lang );
+
+ # webplatform/mediawiki-conversion#19
+ $lang = str_replace(['markup', 'html5'], 'html', $lang);
+ $lang = str_replace(['javascript', 'script'], 'js', $lang);
+ return sprintf("\n<pre class=\"language-%s\">\n%s\n\n</pre>\n", $lang, htmlentities($text));
+
if( !preg_match( '/^[a-z_0-9-]*$/', $lang ) ) {
$error = self::formatLanguageError( $text );
return $error;
--
2.4.2
Another iteration of the patch.
But this time, covers the following issues:
- Won't break contents within the code block that isn't ascii (i.e. chinese text in comments). Otherwise it would remove the full code block.
- Don't escape twice while attempting to escape.
- Allow to use MediaWiki parser tag to escape, not just syntax highlight. Much useful if you need to make sure a wiki transcluded template may contain code that isn't always escaped.
Patch
From 363a42c5d3314445e0f35713cc421f767d3f4a82 Mon Sep 17 00:00:00 2001
From: Renoir Boulanger <renoir@w3.org>
Date: Wed, 16 Sep 2015 13:04:59 -0400
Subject: [PATCH] Required change to solve webplatform/mediawiki-conversion#19
---
SyntaxHighlight_GeSHi.class.php | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/SyntaxHighlight_GeSHi.class.php b/SyntaxHighlight_GeSHi.class.php
index d179e77..d7ede30 100644
--- a/SyntaxHighlight_GeSHi.class.php
+++ b/SyntaxHighlight_GeSHi.class.php
@@ -59,6 +59,14 @@ class SyntaxHighlight_GeSHi {
}
}
$lang = strtolower( $lang );
+
+ # RBx webplatform/mediawiki-conversion#19
+ $lang = str_replace(['markup', 'html5'], 'html', $lang);
+ $lang = str_replace(['javascript', 'script'], 'js', $lang);
+ $escaped = htmlspecialchars($text, ENT_COMPAT|ENT_HTML401, ini_get("default_charset"), false);
+ return sprintf("\n<pre class=\"language-%s\">\n%s\n\n</pre>\n", $lang, $escaped);
+ # /RBx
+
if( !preg_match( '/^[a-z_0-9-]*$/', $lang ) ) {
$error = self::formatLanguageError( $text );
wfProfileOut( __METHOD__ );
--
1.9.1
Example of syntax escaping from a transcluded template
A Template:Single_Example
code sample MediaWiki template
<noinclude>
A block for a single example. Automatically wraps in syntax highlighting.
'''If you manually include an inline-example code block, do not use this template; use [[Template:Inline Example]] instead.''' The Examples section in many article types will automatically use this template.
<pre>
{{Single Example
|Code=
|LiveURL=
|Language=
|Description=
}}
</pre>
{{TODO | Use prism.js for syntax highlighting}}
</noinclude><includeonly>
{{{Description
|}}}
{{#ifeq: {{{Language|Markup}}} | Markup | {{#set:Language=html}} }}
<div class="example">
{{#tag:syntaxHighlight
|{{{Code|}}}
|lang={{{Language|}}}
}}
{{#if: {{{LiveURL|}}} | [{{{LiveURL|}}} View live example]
|}}
</div>
</includeonly>