Strange behavior with identifying double quotes
Closed this issue · 11 comments
I think I'm using the latest version of the tree-sitter grammar (as included in https://github.com/emacs-tree-sitter/tree-sitter-langs though looking at it, I'm not entirely sure I have a way of telling what variation of the grammar I've got). I'm using typst-ts-mode
for Emacs, however as far as I can tell, the syntax coloration is handled through tree-sitter so I THINK this is the right place to address a potential bug.
I'm seeing some strange marking of double quote marks. In my current document, it starts when I use the #show regex()
command. The code that produces the error is:
//
// Convenience Formatting
//
// Helps highlight To Do tagging
#show "TODO": it => text(red, strong(it))
// Forces any string matching the expression for hex number to be monospaced
// automatically. The string splicing is a hack making the renderer avoid
// infinite recursion. Cannot be used with _ separators as this breaks
// text nodes in Typst.
#show regex("0x[a-fA-F0-9]*"): it => raw(it.text.at(0)) + raw(it.text.slice(1))
// Forces any string matching the expression for a binary number to be
// monospaced automatically. The string splicing is a hack making the renderer
// avoid infinite recursion.
#show regex("0b[01]*"): it => raw(it.text.at(0)) + raw(it.text.slice(1))
The syntax errors begin at the first regex
as shown here:
Note that it does not catch the first "
of the regex
instruction, but it DOES catch the 2nd "
leading to a whole slew of issues throughout the document later. This Typst code does compile without error and I've checked the syntax here on the online Typst editor and it's working there. The code is fine, however I think the tree-sitter grammar has an error.
Or if this has been fixed, it's because the bundled libraries I've got are out of date. That's also a possibility. I'm in Windows so it's not a particularly easy thing to build ancillary libraries for whatever I want and I lean on the bundled repo.
I cannot reproduce this problem using either of the dynamic library from release 0.12.81 or compiling Typst tree sitter parser from source. I'm on Linux, using Emacs master branch.
Can you check you have latest typst-ts-mode
installed?
I suppose that's good that it means there's probably a version issue. I have the following out of Git for typst-ts-mode
Head: main feat: Typst v0.9.0
Merge: origin/main feat: Typst v0.9.0
Recent commits
39a9e63 main origin/main feat: Typst v0.9.0
bbac54c doc: update README
4c26bf9 doc: update README
86492f0 doc: update README
5ff1b21 fix(compilation + watch) error regexp error
004ec93 fix: `typst-ts-markup-header-scale` custom setting function error
9d1adbc feat: adopt update stream parser syntax change and add dynamic height for headers
439deb4 fix(preview): open non-english character file name
ea69e2f fix: don't indent content inside raw block
b16762c doc: README
The tree-sitter grammar bundle I have downloaded is 0.12.77 it appears. I'll update that and at least bring everything in-line. Perhaps there's been a more recent update to tree-sitter-typst
that I'm missing.
Took a little bit to find the language function, but after doing: (treesit-language-abi-version 'typst)
I get 13 (#o15, #xd, ?\C-m)
so I think I'm at version 13. I"m grabbing the version 0.12.81 and see if that improves things..
Okay, I have updated everything. The tree-sitter grammar is still at 13 and typst-ts-mode is as listed above at 0.9.0 with the last commit of 39a9e63. However I have discovered something new..
That code snippet placed into a NEW file properly parses! Whatever has happened, happened earlier to throw off the lexing. I am now in a process of copying over bits of the document and trying to figure out exactly where the first error appears. Will update as soon as I can.
That was relatively quick. I have found the issue though I do not know what the formal rule is. So I had previously:
#set list ( blah, blah, blah) // One space between list and (
This is correct Typst grammar, at least according to Typst itself both the website app and the standalone build. HOWEVER, both the website app and the tree-sitter grammar do not lex if there is a space. It very much wants it to be:
#set list(blah, blah, blah) // No space between list and (
So, I have it fixed. However the language specification does appear to permit a space between a function name and the parameters. It might be wise for either Typst to decide this is illegal, or for grammars to show correctly if there is a space.
Note you can use treesit-explore-mode
to get the tree sitter structure tree, then you can select some text to know the current node of the selected text. Also, describe-face
would be useful to let you know what exactly the text at point is parsed into what face in Emacs
Yep using that on #set list(foobar)
shows
(set set
(call item: (builtin)
... lots more
And when a space is inserted and I explore the lexing for #set list (foobar)
explore says:
(ERROR
(field
(lots more
So very much the grammar requires no space between function and parenthesis.
It seems that it would be nicer to add a ERROR face to avoid this problem. I will add to it later (I'm the author of the typst-ts-mode
). Thanks for sharing this problem.
That'll certainly make it easier to spot where things went wrong! I should have tried putting in snippets into a different document earlier to try to narrow down the issue. And yes, I noticed. I actually initially tried to contact you via the Sourcehut discussion but that repository is VERY OPINIONATED about emails and after I get a bounce, I kind of backburner contacting and rarely get back to it. Being able to easily put in formatting and screenshots does help a lot.
Still I think the actual issue is that there's a mismatch between what the tree-sitter grammar thinks is correct, and what Typst thinks is correct. If compilation failed, that's a natural corrective error.