[JavaScript] some highlighting issues
user-982634 opened this issue · 7 comments
Your syntax package fails for the same test cases as in the linked issue.
According to the fact that this repo is labeled as "Sublime-syntax definition for ES6+ with absurdly specific scopes", I hope you would be interested in fixing these issues.
If I can help somehow, let me know..
I remember you, buddy.
I remember you, buddy.
This is the most confusing comment I read so far on github. At least, I hope you can explain to yourself why you closed this, since I don't see the reason, nor I understand what you mean.
Can you explain why have you blocked my account and why have you closed this issue? I think your actions are very non-professional.
You mean there’s two of you?
On the off chance that you’re really not the same user:
A year ago someone began opening issues here for various unusual, but syntactically legal, constructions. Like you, this person cross-posted the issues on both this repo and Sublime HQ. Some were things we could fix, and did; one was even something a person might encounter in real code. Some were things which (like most of the items in the new list) can’t be addressed in a Sublime Syntax definition for technical reasons. I see someone tried to explain this on the other thread.
At first I thought it was pretty cool. I love this stuff, I was very impressed by the assembled lists, and it can be fun to sort these things out. But as mentioned, some of them weren’t fixable. Some others were, technically, but only in a subset of cases — and solving for them would have required hundreds of lines of very clever code. The trouble is, this act of "fixing" them would have provided no value to users, yet would have had a high cost for maintenance and performance.
When I explained why some of their issues weren’t things we could or would address, they responded with a condescending, zany rant. One thing they included in their rant was quoting, in italics, the phrase absurdly specific scopes. They claimed we needed to change that description since this project is garbage, why would we even use Sublime Text if it’s so broken, etc etc. (They apparently didn’t understand that very specific scopes — e.g. having a scope defined for the parentheses delimiting a while loop’s expression — and perfect accuracy aren’t the same concept.)
Assuming you’re not the same person, and the consistencies (down to writing style and creating multiple generic user accounts) are coincidences ...
I actually sympathize with this perspective from a syntax-nerd angle for sure: a great highlighter should be built on a properly derived AST, not a labyrinth of regex patterns and heuristics. I would love to do real parsing in Sublime, and I’ve even researched how one might approach it. As it turns out, the relevent parts of Sublime aren’t exposed as public API. The closest you can get, that I know of, is to abuse the region marking system, but this doesn’t behave the same and (rather critically) it isn’t interoperable with other packages people use.
This project, like the one at Sublime HQ, is a Sublime Syntax definition though. It would be cool to build something like that, but these aren’t that. Oniguruma regex is a good deal more powerful than actually-regular regular expressions, which is why one can do brace-matching in lookaheads (on the same line) for example, but it’s still not possible to use these regex patterns to parse in the formal sense — especially not within expressions, which are left-recursive. As you show in your examples, one of the most challenging points is determining when to transition between the four lexical goals of ES, which is why it’s easy to create unhighlightable sequences using '/'. Instead we need to resort to (sometimes rather clever) heuristics — which work for real code, most of the time, but they aren’t perfect.
There are also some cases where the technical limitation is not about what Sublime Syntax and Oniguruma can do in theory, but rather about complexity and performance. This project already brushes up against the edge of that, and I have seen content which makes it perform perceptibly slower than the default definition because of its fairly heavy use of expensive lookaheads.
(BTW, this is not a professional project. It’s just for fun.)
tl;dr: someone who seemed very like you by appearances once posted lots of issues about similar cases. They weren’t helpful; they were demanding and rude, and it began to feel like trolling after a few days. Finally they exploded in a tirade about how terrible our work is and deleted their account (which also was generically named and had only existed for a month). If that person is not you, and you would like to contribute here with code that addresses these things, you’re welcome to.
Thanks for explaining. At first, I was surprised because when you closed my issue I was confused and I thought that I did something wrong or that this repository are so unwelcoming.
I am ok with not fixing these issue, I was just wondering if maintainers of this repository would like to know about test cases I found, but if your main goal is performance I think there is really nothing to be done here.
As an answer to your sentence:
Oniguruma regex is a good deal more powerful than actually-regular regular expressions, which is why one can do brace-matching in lookaheads (on the same line) for example, but it’s still not possible to use these regex patterns to parse in the formal sense — especially not within expressions, which are left-recursive.
If you read the sublime issue thread I posted, in the comments Thom said that he proposed a proposal for non-deterministic parsing, so it will be possible (probably in the next version of sublime) to parse javascript test cases more easily and time-efficiently. If the only hindrance is performance, I think when the Thom's proposal is implemented in the core, it would be very easy.
So, to summarize, this is your repository, not mine, so the decision to fix or not to fix issues is your. I am ok with that.
btw, can you please unblock my original account (the account from which I posted this issue), so I can delete this new account (the account from which I posted this comment). I mean, being blocked by another user is really the worst experience that happened to me so far.
Thanks for unblock. I hope you have a great day!
NP, and sorry for the mix-up. It really seemed like deja vu; some of the items in the list are even the same.
I just read Thom’s proposal. It’s very nicely assembled and we would surely take advantage of that if it existed. In practice, the places where the limitations are felt almost always involve newlines but they are usually still rather short. Multiline destructuring, especially of arguments, as shown there, is the most common. In ES Sublime we do recovery on this but as he describes, it’s currently impossible to "go back" and correct stuff that was already matched:
My gut is telling me that even with this tool we would still end up having to stop short of perfect matching within expressions because handling them perfectly would, even for relatively short expressions, often imply a very large stack of non-deterministic matches. In other words, some "/" edge cases would likely remain, but these are very rare, since most examples we can create would end up being runtime errors anyway. But I’d certainly try it and see what happens. For me the dream would be to actually provide scoping that’s 1:1 with the formal productions. We do approach this with statement keywords and delimiters, which are, with a few exceptions, deterministic token-by-token.
One thing I’ve considered is generating the grammar definition. I believe that if we generated it programmatically, it would become manageable to — for example — create distinct scope trees corresponding to the parameterized permutations of various productions. These would balloon the size of the definition, but I wouldn’t expect it (could be wrong) to have a significant performance cost, since it doesn’t actually imply performing any more, or more complex, matches in a given context than are currently attempted.
One thing I’ve considered is generating the grammar definition. I believe that if we generated it programmatically, it would become manageable to — for example — create distinct scope trees corresponding to the parameterized permutations of various productions.
I'm totally with you on this one, and have also had thoughts about generating the def before as well. Another benefit is that it could also make it easier to export defs for e.g., ace, minted, atom, etc.