jgm/skylighting

2.13 Regression of XML syntax highlighting (md -> html5)

kazalex opened this issue · 11 comments

markdown:

``` {.xml}
<?xml version="1.0" encoding="utf-8"?>
<methodCall>
    <methodName>system.listMethods</methodName>
    <params/>
</methodCall>
```

2.12 (XML tags has span class "keyword"):

<div class="sourceCode" id="cb10"><pre class="sourceCode xml"><code class="sourceCode xml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">&lt;?xml</span> version=&quot;1.0&quot; encoding=&quot;utf-8&quot;<span class="kw">?&gt;</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="kw">&lt;methodCall&gt;</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">&lt;methodName&gt;</span>system.listMethods<span class="kw">&lt;/methodName&gt;</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">&lt;params/&gt;</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="kw">&lt;/methodCall&gt;</span></span></code></pre></div>
</div>

2.13 (XML tags has span class "error"):

<div class="sourceCode" id="cb10"><pre class="sourceCode xml"><code class="sourceCode xml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">&lt;?xml</span><span class="ot"> version=</span><span class="st">&quot;1.0&quot;</span><span class="ot"> encoding=</span><span class="st">&quot;utf-8&quot;</span><span class="fu">?&gt;</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>&lt;<span class="ot">methodCall</span><span class="er">&gt;</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>    <span class="er">&lt;methodName&gt;system.listMethods&lt;/methodName&gt;</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>    <span class="er">&lt;params/&gt;</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="er">&lt;/methodCall&gt;</span></span></code></pre></div>
</div>
jgm commented

Simple repro:

<a>
 <b/>
</a>

Everything from the closing > on the first line on is error token.

jgm commented

In fact just <a> is enough to reproduce.
Here is trace output for the tokenizer:

Trying rule Rule {rMatcher = IncludeRules ("XML","FindXML"), rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectSpaces, rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = StringDetect "<!--", rAttribute = CommentTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","Comment")]}
Trying rule Rule {rMatcher = StringDetect "<![CDATA[", rAttribute = BaseNTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = True, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","CDATAStart")]}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "<!(?=DOCTYPE\\s+)", reCaseSensitive = True}), rAttribute = DataTypeTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","DoctypeTagName")]}
Trying rule Rule {rMatcher = IncludeRules ("XML","FindProcessingInstruction"), rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "<\\?(?=([\\w:_-]*))", reCaseSensitive = True}), rAttribute = FunctionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","PI TagName")]}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "<(?=((?![0-9])[\\w_:][\\w.:_-]*))", reCaseSensitive = True}), rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","ElementTagName")]}
RegExpr MATCHED Just (NormalTok,"<")
CONTEXT STACK ["ElementTagName","Start"]
IncludeRules MATCHED Just (NormalTok,"<")
Trying rule Rule {rMatcher = StringDetect "%1", rAttribute = KeywordTok, rIncludeAttribute = False, rDynamic = True, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop,Push ("XML","Element")]}
CONTEXT STACK ["Start"]
CONTEXT STACK ["Element","Start"]
Trying rule Rule {rMatcher = Detect2Chars '/' '>', rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop]}
Trying rule Rule {rMatcher = DetectChar '>', rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","El Content")]}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "(?:^|\\s+)(?![0-9])[\\w_:][\\w.:_-]*", reCaseSensitive = True}), rAttribute = OtherTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","Attribute")]}
RegExpr MATCHED Just (OtherTok,"a")
CONTEXT STACK ["Attribute","Element","Start"]
Trying rule Rule {rMatcher = DetectChar '=', rAttribute = OtherTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop,Push ("XML","Value")]}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "\\S", reCaseSensitive = True}), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
RegExpr MATCHED Just (ErrorTok,">")
<a>
jgm commented

Here are the latest changes to xml.xml syntax definition, which I merged from upstream (KDE) before this release:

diff --git a/skylighting-core/xml/xml.xml b/skylighting-core/xml/xml.xml
index d2cc327..fbefcb6 100644
--- a/skylighting-core/xml/xml.xml
+++ b/skylighting-core/xml/xml.xml
@@ -6,7 +6,7 @@
 	<!ENTITY name    "(?![0-9])[\w_:][\w.:_-]*">
 	<!ENTITY entref  "&amp;(?:#[0-9]+|#[xX][0-9A-Fa-f]+|&name;);">
 ]>
-<language name="XML" version="11" kateversion="5.0" section="Markup" extensions="*.docbook;*.xml;*.rc;*.daml;*.rdf;*.rss;*.xspf;*.xsd;*.svg;*.ui;*.kcfg;*.qrc;*.wsdl;*.scxml;*.xbel;*.dae;*.sch;*.brd" mimetype="text/xml;text/book;text/daml;text/rdf;application/rss+xml;application/xspf+xml;image/svg+xml;application/x-designer;application/x-xbel;application/xml;application/scxml+xml" casesensitive="1" indenter="xml" author="Wilbert Berendsen (wilbert@kde.nl)" license="LGPL">
+<language name="XML" version="12" kateversion="5.0" section="Markup" extensions="*.docbook;*.xml;*.rc;*.daml;*.rdf;*.rss;*.xspf;*.xsd;*.svg;*.ui;*.kcfg;*.qrc;*.wsdl;*.scxml;*.xbel;*.dae;*.sch;*.brd" mimetype="text/xml;text/book;text/daml;text/rdf;application/rss+xml;application/xspf+xml;image/svg+xml;application/x-designer;application/x-xbel;application/xml;application/scxml+xml" casesensitive="1" indenter="xml" author="Wilbert Berendsen (wilbert@kde.nl)" license="LGPL">
 
 <highlighting>
 <contexts>
@@ -17,10 +17,10 @@
   <context name="FindXML" attribute="Normal Text" lineEndContext="#stay">
     <DetectSpaces />
     <StringDetect attribute="Comment" context="Comment" String="&lt;!--" beginRegion="comment" />
-    <StringDetect attribute="CDATA" context="CDATA" String="&lt;![CDATA[" beginRegion="cdata" />
-    <RegExpr attribute="Doctype" context="Doctype" String="&lt;!DOCTYPE\s+" beginRegion="doctype" />
-    <RegExpr attribute="Processing Instruction" context="PI" String="&lt;\?[\w:_-]*" beginRegion="pi" />
-    <RegExpr attribute="Element" context="Element" String="&lt;&name;" beginRegion="element" />
+    <StringDetect attribute="CDATA" context="CDATAStart" String="&lt;![CDATA[" lookAhead="true" />
+    <RegExpr attribute="Doctype Symbols" context="DoctypeTagName" String="&lt;!(?=DOCTYPE\s+)" beginRegion="doctype" />
+    <IncludeRules context="FindProcessingInstruction" />
+    <RegExpr attribute="Element Symbols" context="ElementTagName" String="&lt;(?=(&name;))" beginRegion="element" />
     <IncludeRules context="FindEntityRefs" />
     <DetectIdentifier />
   </context>
@@ -45,32 +45,63 @@
     <DetectIdentifier />
   </context>
 
+  <context name="CDATAStart" attribute="Other Text" lineEndContext="#pop">
+    <StringDetect attribute="CDATA Symbols" context="#stay" String="&lt;![" beginRegion="cdata" />
+    <StringDetect attribute="CDATA" context="#stay" String="CDATA" />
+    <DetectChar attribute="CDATA Symbols" context="#pop!CDATA" char="[" />
+  </context>
   <context name="CDATA" attribute="Other Text" lineEndContext="#stay">
     <DetectSpaces />
     <DetectIdentifier />
-    <StringDetect attribute="CDATA" context="#pop" String="]]&gt;" endRegion="cdata" />
+    <StringDetect attribute="CDATA Symbols" context="#pop" String="]]&gt;" endRegion="cdata" />
     <StringDetect attribute="EntityRef" context="#stay" String="]]&amp;gt;" />
   </context>
 
+  <context name="FindProcessingInstruction" attribute="Other Text" lineEndContext="#stay">
+    <RegExpr attribute="PI Symbols" context="PI TagName" String="&lt;\?(?=([\w:_-]*))" beginRegion="pi" />
+  </context>
+  <context name="PI TagName" attribute="Other Text" lineEndContext="#pop!PI" fallthrough="true" fallthroughContext="#pop!PI">
+    <RegExpr attribute="Processing Instruction" context="#pop!PI-XML" String="xml(?=\s|$)" insensitive="true" />
+    <StringDetect attribute="Processing Instruction" context="#pop!PI" String="%1" dynamic="true" />
+  </context>
   <context name="PI" attribute="Other Text" lineEndContext="#stay">
-    <Detect2Chars attribute="Processing Instruction" context="#pop" char="?" char1="&gt;" endRegion="pi" />
+    <Detect2Chars attribute="PI Symbols" context="#pop" char="?" char1="&gt;" endRegion="pi" />
+  </context>
+  <context name="PI-XML" attribute="Other Text" lineEndContext="#stay">
+    <IncludeRules context="PI" />
+    <RegExpr attribute="Attribute" context="#stay" String="(?:^|\s+)&name;" />
+    <DetectChar attribute="Attribute" context="Value" char="=" />
   </context>
 
+  <context name="DoctypeTagName" attribute="Other Text" lineEndContext="#pop">
+    <StringDetect attribute="Doctype" context="#pop!DoctypeVariableName" String="DOCTYPE" />
+  </context>
+  <context name="DoctypeVariableName" attribute="Other Text" lineEndContext="#pop!Doctype" fallthrough="true" fallthroughContext="#pop!Doctype">
+    <DetectSpaces />
+    <RegExpr attribute="Doctype Name" context="#pop!Doctype" String="&name;" />
+  </context>
   <context name="Doctype" attribute="Other Text" lineEndContext="#stay">
-    <DetectChar attribute="Doctype" context="#pop" char="&gt;" endRegion="doctype" />
-    <DetectChar attribute="Doctype" context="Doctype Internal Subset" char="[" beginRegion="int_subset" />
+    <DetectChar attribute="Doctype Symbols" context="#pop" char="&gt;" endRegion="doctype" />
+    <DetectChar attribute="Doctype Symbols" context="Doctype Internal Subset" char="[" beginRegion="int_subset" />
   </context>
 
   <context name="Doctype Internal Subset" attribute="Other Text" lineEndContext="#stay">
-    <DetectChar attribute="Doctype" context="#pop" char="]" endRegion="int_subset" />
-    <RegExpr attribute="Doctype" context="Doctype Markupdecl" String="&lt;!(?:ELEMENT|ENTITY|ATTLIST|NOTATION)\b" />
+    <DetectChar attribute="Doctype Symbols" context="#pop" char="]" endRegion="int_subset" />
+    <RegExpr attribute="Doctype Symbols" context="Doctype Markupdecl TagName" String="&lt;!(?=(ELEMENT|ENTITY|ATTLIST|NOTATION)\b)" />
     <StringDetect attribute="Comment" context="Comment" String="&lt;!--" beginRegion="comment" />
-    <RegExpr attribute="Processing Instruction" context="PI" String="&lt;\?[\w:_-]*" beginRegion="pi" />
+    <IncludeRules context="FindProcessingInstruction" />
     <IncludeRules context="FindPEntityRefs" />
   </context>
 
+  <context name="Doctype Markupdecl TagName" attribute="Other Text" lineEndContext="#pop">
+    <StringDetect attribute="Doctype" context="#pop!Doctype Markupdecl VariableName" String="%1" dynamic="true" />
+  </context>
+  <context name="Doctype Markupdecl VariableName" attribute="Other Text" lineEndContext="#pop!Doctype Markupdecl" fallthrough="true" fallthroughContext="#pop!Doctype Markupdecl">
+    <DetectSpaces />
+    <RegExpr attribute="Doctype Name" context="#pop!Doctype Markupdecl" String="&name;" />
+  </context>
   <context name="Doctype Markupdecl" attribute="Other Text" lineEndContext="#stay">
-    <DetectChar attribute="Doctype" context="#pop" char="&gt;" />
+    <DetectChar attribute="Doctype Symbols" context="#pop" char="&gt;" />
     <DetectChar attribute="Value" context="Doctype Markupdecl DQ" char="&quot;" />
     <DetectChar attribute="Value" context="Doctype Markupdecl SQ" char="&apos;" />
   </context>
@@ -85,25 +116,31 @@
     <IncludeRules context="FindPEntityRefs" />
   </context>
 
+  <context name="ElementTagName" attribute="Other Text" lineEndContext="#pop!Element" fallthrough="true" fallthroughContext="#pop!Element">
+    <StringDetect attribute="Element" context="#pop!Element" String="%1" dynamic="true" />
+  </context>
   <context name="Element" attribute="Other Text" lineEndContext="#stay">
-    <Detect2Chars attribute="Element" context="#pop" char="/" char1="&gt;" endRegion="element" />
-    <DetectChar attribute="Element" context="El Content" char="&gt;" />
+    <Detect2Chars attribute="Element Symbols" context="#pop" char="/" char1="&gt;" endRegion="element" />
+    <DetectChar attribute="Element Symbols" context="El Content" char="&gt;" />
     <RegExpr attribute="Attribute" context="Attribute" String="(?:^|\s+)&name;" />
     <RegExpr attribute="Error" context="#stay" String="\S" />
   </context>
 
   <context name="El Content" attribute="Other Text" lineEndContext="#stay">
-    <RegExpr attribute="Element" context="El End" String="&lt;/&name;" />
+    <RegExpr attribute="Element Symbols" context="El End TagName" String="&lt;/(?=(&name;))" />
     <IncludeRules context="FindXML" />
   </context>
 
+  <context name="El End TagName" attribute="Other Text" lineEndContext="#pop!El End" fallthrough="true" fallthroughContext="#pop!El End">
+    <StringDetect attribute="Element" context="#pop!El End" String="%1" dynamic="true" />
+  </context>
   <context name="El End" attribute="Other Text" lineEndContext="#stay">
-    <DetectChar attribute="Element" context="#pop#pop#pop" char="&gt;" endRegion="element" />
+    <DetectChar attribute="Element Symbols" context="#pop#pop#pop" char="&gt;" endRegion="element" />
     <RegExpr attribute="Error" context="#stay" String="\S" />
   </context>
 
   <context name="Attribute" attribute="Other Text" lineEndContext="#stay">
-    <DetectChar attribute="Attribute" context="Value" char="=" />
+    <DetectChar attribute="Attribute" context="#pop!Value" char="=" />
     <RegExpr attribute="Error" context="#stay" String="\S" />
   </context>
 
@@ -114,29 +151,34 @@
   </context>
 
   <context name="Value DQ" attribute="Value" lineEndContext="#stay">
-    <DetectChar attribute="Value" context="#pop#pop#pop" char="&quot;" />
+    <DetectChar attribute="Value" context="#pop#pop" char="&quot;" />
     <IncludeRules context="FindEntityRefs" />
   </context>
 
   <context name="Value SQ" attribute="Value" lineEndContext="#stay">
-    <DetectChar attribute="Value" context="#pop#pop#pop" char="&apos;" />
+    <DetectChar attribute="Value" context="#pop#pop" char="&apos;" />
     <IncludeRules context="FindEntityRefs" />
   </context>
 
 </contexts>
 <itemDatas>
-  <itemData name="Normal Text" defStyleNum="dsNormal" />
-  <itemData name="Other Text" defStyleNum="dsNormal" />
-  <itemData name="Comment" defStyleNum="dsComment" spellChecking="false" />
-  <itemData name="CDATA" defStyleNum="dsBaseN" bold="1" spellChecking="false" />
-  <itemData name="Processing Instruction" defStyleNum="dsKeyword" spellChecking="false" />
-  <itemData name="Doctype" defStyleNum="dsDataType" bold="1" spellChecking="false" />
-  <itemData name="Element" defStyleNum="dsKeyword" spellChecking="false" />
-  <itemData name="Attribute" defStyleNum="dsOthers" spellChecking="false" />
-  <itemData name="Value" defStyleNum="dsString" spellChecking="false" />
-  <itemData name="EntityRef" defStyleNum="dsDecVal" spellChecking="false" />
-  <itemData name="PEntityRef" defStyleNum="dsDecVal" spellChecking="false" />
-  <itemData name="Error" defStyleNum="dsError" spellChecking="false" />
+  <itemData name="Normal Text"     defStyleNum="dsNormal" />
+  <itemData name="Other Text"      defStyleNum="dsNormal" />
+  <itemData name="Comment"         defStyleNum="dsComment" spellChecking="false" />
+  <itemData name="CDATA"           defStyleNum="dsBaseN"    bold="1" italic="0" spellChecking="false" />
+  <itemData name="CDATA Symbols"   defStyleNum="dsBaseN"    bold="0" italic="0" spellChecking="false" />
+  <itemData name="Processing Instruction" defStyleNum="dsFunction" bold="1" italic="0" spellChecking="false" />
+  <itemData name="PI Symbols"      defStyleNum="dsFunction" bold="0" italic="0" spellChecking="false" />
+  <itemData name="Doctype"         defStyleNum="dsDataType" bold="1" italic="0" spellChecking="false" />
+  <itemData name="Doctype Name"    defStyleNum="dsDataType" bold="0" italic="0" spellChecking="false" />
+  <itemData name="Doctype Symbols" defStyleNum="dsDataType" bold="0" italic="0" spellChecking="false" />
+  <itemData name="Element"         defStyleNum="dsKeyword" spellChecking="false" />
+  <itemData name="Element Symbols" defStyleNum="dsNormal" spellChecking="false" />
+  <itemData name="Attribute"       defStyleNum="dsOthers" spellChecking="false" />
+  <itemData name="Value"           defStyleNum="dsString" spellChecking="false" />
+  <itemData name="EntityRef"       defStyleNum="dsDecVal" spellChecking="false" />
+  <itemData name="PEntityRef"      defStyleNum="dsDecVal" spellChecking="false" />
+  <itemData name="Error"           defStyleNum="dsError" spellChecking="false" />
 </itemDatas>
 
 </highlighting>
jgm commented

I can confirm that reverting xml.xml to the version from 0.10.5 works.
So something in this round of changes caused the problem.

jgm commented

End of trace for the working one:

CONTEXT STACK ["Element","Start"]
IncludeRules MATCHED Just (KeywordTok,"<a")
Trying rule Rule {rMatcher = Detect2Chars '/' '>', rAttribute = KeywordTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop]}
Trying rule Rule {rMatcher = DetectChar '>', rAttribute = KeywordTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","El Content")]}
DetectChar MATCHED Just (KeywordTok,">")
CONTEXT STACK ["El Content","Element","Start"]
jgm commented

So the issue in the current version is this:

Trying rule Rule {rMatcher = RegExpr (RE {reString = "(?:^|\\s+)(?![0-9])[\\w_:][\\w.:_-]*", reCaseSensitive = True}), rAttribute = OtherTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("XML","Attribute")]}
RegExpr MATCHED Just (OtherTok,"a")
CONTEXT STACK ["Attribute","Element","Start"]

Why does it think we have an attribute?
The relevant xml is

     <RegExpr attribute="Attribute" context="Attribute" String="(?:^|\s+)&name;" />

However, note that this was not changed in the latest changes.

jgm commented

I can see what should be happening. First, we should be matching

  <RegExpr attribute="Element Symbols" context="ElementTagName" String="&lt;(?=(&name;))" beginRegion="element" />  

and the element name should be captured.

Then, we go to ElementTagName context, and match

<StringDetect attribute="Element" context="#pop!Element" String="%1" dynamic="true" />  

With %1 = the previously matched element name. But this isn't occurring. Why not?

jgm commented

Answer: %1 is not defined.

jgm commented

Confirmed that the regex application here doesn't produce a captured group.
That is a bug in our regex engine.

jgm commented

In ghci we can see the root issue:

Prelude Regex.KDE> testRegex False "<(a+)" "<a>"
Just ("<a",[(1,"a")])
Prelude Regex.KDE> testRegex False "<(?=(a+))" "<a>"
Just ("<",[])

Captures are ignored inside the lookahead (?=...).

jgm commented
*Skylighting.Types Regex.KDE> compileRegex False "<(?=(a+))" 
Right (MatchConcat (MatchChar <fn>) (MatchConcat (AssertPositive Forward (MatchConcat (MatchCapture 1 (MatchConcat (MatchSome (MatchChar <fn>)) MatchNull)) MatchNull)) MatchNull))

Note MatchCapture 1 is in there, but somehow it seems not to wokr.