jgm/skylighting

Stata highlighting wrongly highlights keywords contained in comments

Opened this issue · 3 comments

In Stata's built-in code editor, all text contained in a comment is highlighted as a comment. For example:

image

In Pandoc outputs, plain text in comments that also happens to be Stata keywords is highlighted as a keyword. For example:

image

Note that adopath, BASE, means, and until--all incorrectly highlighted words following comments--appear as words to highlight in stata.xml.

When inspecting the HTML output, one can see the keyword class being applied. Furthermore, the text after comments appears to be "tokenized", and each "token" gets a different highlighting style depending what class it belongs to (e.g., keyword, list of commands, etc.)

image

Here's how I produced the HTML output in the last two images above.

  1. Create a Markdown file
---
title: Hello
---

Here's some Stata code:

```stata
* Set user root folder
global root "C:\Users\user123\github\myproject"

* Set PLUS to adopath and list it first, then list BASE first.
* This means that BASE is first and PLUS is second.
adopath ++  "${root}/code/ado"
adopath ++  BASE

* Keep removing adopaths with rank 3 until only BASE and the project ado-folder,
* that has rank 1 and 2, are left in the adopaths
local morepaths 1
while (`morepaths' == 1) {
  capture adopath - 3
  if _rc local morepaths 0
}
```

  1. Render as HTML with Pandoc
pandoc stata_test.md -f markdown -t html -s -o stata_test.html

Note: I've not (yet) investigated whether this issue also arises for other comments (e.g., single-line comments starting with //, end-of-line comments with ///, or multi-line comments starting with /* and ending with */).

Sorry if I'm posting this in the wrong place, or providing less than helpful information.

skylighting is a really amazing tool. I'm coming to it from a project that uses Quarto to write HTML documentation for Stata packages.

jgm commented

I'm seeing the same behavior when I open the file with the Kate editor. This indicates that the problem is in the stata.xml syntax definition from KDE, which skylighting is interpreting accurately. Try submitting a report there (see our README for some links).

Many thanks for pointing me in the right direction.

If this gets fixed upstream, should I notify you here? Or do you periodically pull new/improved syntax definitions from upstream?

jgm commented

Wouldn't hurt to notify here. But yes, every once in a while we pull in changes from upstream.