area/language-latex

\let is not highlighted

Closed this issue · 16 comments

\let is not highlighted but \def is.
Desired consistent behavior should be to also highlight \let.

Both are primitives, and they do similar things, so I think this is reasonable. The actual rule that highlights \def is here and seems kind of arbitrary. I would further suggest that all primitives be recognised. I've compiled a list of the 325 in the TeXbook, but some more are introduced by pdfTeX.

So given that you already have a list of 325 TeX primitives it seems rather simple to generate rules for them, or even better one combined rule.
I've only recently started playing around with Atom so I'm not yet familiar with its syntax highlighting rules. But if I can help you with anything please let me know!
After all coming from TeXworks writing LaTeX in Atom already feels like a huge improvement!

it seems rather simple to generate rules for them, or even better one combined rule

My thoughts exactly. Until I started trying. Eventually, I came up with this though. It works well enough.

So yes, I could construct a rule that will find a match to a given list of words. However, I'll wait until other maintainers of this repo share their thoughts, as it may not be desirable for whatever reasons.

If you want to learn how syntax highlighting works, or any questions about what I linked to, feel free to open an issue in my own language-latex2e repo (so we don't pollute the issues here). Given my own (terrible) experience trying to learn it, I'm happy to help.

@yudai-nkt What's your opinion on scoping primitives? I support it, as I think it's nice to know just by looking at a control word whether it's primitive or not, without needing to use \show or \tracing.... Unfortunately, primitives can change between engines. For example, pdftex introduces 100 new ones, and there are doubtless others I still haven't been able to find. However, xetex and luatex don't recognise most of these, and define their own instead.

I can see that this package does support luatex, so it could be potentially confusing if a user defines their own macro that happens to clash with a "primitive" that only exists in a different engine. The chances of this would be quite small I believe, and possibly even a good indicator to the user that their macro may conflict if they decide to change engine. (small because the pdftex ones for example mostly start with pdf...)

If you want a list of the ones I've found, several groups are available here, separated by which engine I believe introduces them. Each list also has a generated regex that only needs minimal formatting to make work. They are already in use for my own package, and they work as expected. I put them just before the generic patterns, so specific primitives can be scoped as desired (eg. \def maybe), while the big list acts as a "catch all" for any that are not handled explicitly.

I agree and think we can add primitives that are specific to some engines as well. In fact, I added some primitives exclusively available in pTeX, a Japanese variant of TeX.

We need to consider which TeX extension we should support, and expl3 can be a good authoritative reference IMO. It practically supports (excerpt from Section 9 of the expl3 documentation):

  • pdfTeX v1.40 or later.
  • XeTeX v0.9994 or later.
  • LuaTeX v0.70 or later.
  • e-(u)pTeX mid-2012 or later.

So I suppose covering primitives in these engines is enough (including Knuthian TeX and e-TeX of course).

If we choose this path, it's better to have some naming convention for the sake of consistency. My proposal is to use a format keyword.primitive.family.extension.tex. family refers to the "Family" in this list. Since this family represents what kind of role each primitive plays, it can be a good criteria of grouping. extension refers to which TeX engines implements the associated primitive. However, it's often difficult to uniquely define the extension so this can be omitted depending on the conclusion of this discussion (current tex.cson doesn't distinguish TeX engines).

Below are some examples of primitives and corresponding scopes with my naming scheme.

primitive scope
\write keyword.primitive.io.tex
\def keyword.primitive.macro.tex
\ifpdfabsnum keyword.primitive.logic.pdftex.tex
\ignoreligaturesinfont keyword.primitive.font.luatex.tex

No major complaints with that. I currently use entity.name.function.primitive.latex for them because it makes them a slightly darker shade of blue than regular commands, so the difference is noticeable but not glaring. In comparison, my current theme makes keywords purple, which tend to stand out (no reason other than that). I suppose that all depends on the syntax theme though.

It will be more difficult to split by family, as opposed to generating one massive regex, but it's doable. I recommend making each family a separate pattern in the repository and creating a separate "meta" pattern that simply includes all the primitive patterns (I would name this aggregate pattern metaPrimitives). This way, each primitive family is maintainable, while it can be expected each is reliably applied so long as it is in the meta pattern.

(I would also recommend that style for the entire grammar, but that's a different issue :)

I'm against scoping primitives as entity just because it looks good in someone's editor. As is described here, each scope has its own meaning. Scope should be decided in terms of grammar and role, not of how your code looks like (appearance does depend on the syntax theme as you mention). If one is dissatisfied with the highlight, he/she should try or make a new theme.

Primitives are kind of reserved words (although they can be redefined like \par and \end in LaTeX), so IMHO keyword is the appropriate scope like in language-python (actually my suggestion keyword.primitive.family.extension.tex is based on this language package).

Regarding the repository feature, it can be utilized but I still think we should include the family in scope names. Comprehensible scope names have two advantages:

  • Users can see what kind of role each primitive has in the editor pane
  • Third party package may hook some process using the detailed scopes to achieve/impelement some functionality.

However, it's also true that each regex will be long in my approach. So how about splitting repositorys by the "Type" (in the aforementioned list) like this? Grouping by two method (i.e., family and type) will reduce the length of regexes.

patterns: [
  {
    include: "#command"
  }
  {
    include: "#parameterToken"
  }
]

repository:
  command:
    patterns: [
      {
        match: '\\b(\\\\(?:immediate|shipout|write))\\b'
        name: 'keyword.primitive.io.tex'
      }
      {
        match: '\\b(\\\\[egx]?def)\\b'
        name: 'keyword.primitive..macro.tex'
      }
    ]
  parameterToken:
    patterns: [
      {
        match: '\\b(\\\\every[hv]box)\\b'
        name: 'keyword.primitive.io.tex'
      }
      {
        match: '\\b(\\\\every(?:display|math))\\b'
        name: 'keyword.primitive.math.tex'
      }
    ]

I don't see the necessity of the "separate 'meta' pattern". In my example, includeing command and parameterToken seems enough. Could you explain the motivation for metaPrimitive that bundle the two sub-repositories?

I'm against scoping primitives as entity just because it looks good in someone's editor

Sorry, I didn't mean we should use it because it looks good, it was just an FYI.

Regarding the repository feature, it can be utilized but I still think we should include the family in scope names

I don't quite follow, my repository suggestion was just a way of turning several individual patterns (such as ones split by family) into one aggregate of patterns, so the rules themselves are separate and maintainable, while applying them all is as simple as putting include: '#metaPrimitives' where they are all wanted.

So how about splitting repositorys by the "Type" (in the aforementioned list) like this?

That's the point of the meta group; they can be split into as many as desired, and just added to the meta group. The actual places where primitives are desired don't need to be checked or updated, as adding a pattern to the meta group will add it to all the places the meta group is used.

Could you explain the motivation for metaPrimitive that bundle the two sub-repositories?

Well, it works best when the entire grammar follows this style. For example, the rules in my main patterns array are just the following:

patterns: [
  { include: '#metaControl' } # must go first to catch escaped sequences
  { include: '#metaPercent' }
  { include: '#metaDollar' }
  { include: '#metaTilde' }
  { include: '#metaAmpersand' }
  { include: '#metaOpenBrace' }
  { include: '#metaCloseBrace' }
  { include: '#metaUnderscore' }
  { include: '#metaCaret' }
  { include: '#metaHashtag' }
]

These address the 10 characters that have special meaning when encountered at the "top" level. Doing it this way means that even if I introduce new rules, it is easy to decide where to put them. For example, primitives would go in the metaControl group because they start with a \. The (current) definition of metaControl (kept in the repository) is as follows:

metaControl: {
    comment: 'All these commands begin with a backslash.'
    patterns: [
      { include: '#metaCommonControlSequences' }
      { include: '#metaEnvironment' }
      { include: '#metaDollar' } # Because of \( and \[. Must appear below escaped characters, to prevent \$ from being a false positive.

      # The following rules are "catch alls" for any control sequences not explicitly addressed above.
      { include: '#metaPrimitives' }
      { include: '#controlSymbol' }
      { include: '#controlWord' }
    ]
  }

Note that most of the entries in this meta group are meta themselves; this abstraction can be nested as desired, to keep the meaning clear and free of clutter. For example, even though explicit rules are not given for environment matching, you can tell that metaEnvironment will handle everything this package does for the different environments. Additionally, you can see that it is looking for primitives in the metaPrimitives match, but we don't see the actual rules. If we needed to, for some reason, we could scroll down to the definition of metaPrimitives and see that it is the following:

metaPrimitives: {
    comment: 'Allows organisation of various primitive sources'
    patterns: [
      { include: '#texPrimitives' }
      { include: '#pdfTexPrimitives' }
      { include: '#unsortedPrimitives' }
    ]
  }

In my case I split them by which engine introduces them; however, they can be split any way you like (eg. family or type would be fine). Finally, the definition of an 'atomic' primitive rule (atomic, because it is an indivisible building block used to build the 'meta' structures) might be as follows:

pdfTexPrimitives: {
    comment: 'These dont apply on all engines, but I feel the names are specific enough and pdfLaTeX is used enough to justify it.'
    name: 'entity.name.function.primitive.pdftex.latex'
    match: '(\\\\)(pdf(o(ut(put|line)|bj(compresslevel)?)|m(inorversion|a(p(file|line)|tch)|ovechars|dfivesum)|c(o(mpresslevel|pyfont|lorstack(init)?)|atalog|reationdate)|d(e(cimaldigits|st(margin)?)|raftmode)|horigin|vorigin|p(age(width|height|sattr|attr|re(sources|f)|box)|k(resolution|mode)|r(otrudechars|ependkern|imitive)|xdimen)|i(n(fo(omitdate)?|clu(dechars|sion(errorlevel|copyfonts))|terwordspaceo(n|ff)|sertht)|gnoreddimen|mage(resolution|hicolor|applygamma|gamma))|s(uppress(ptexinfo|warning(dup(map|dest)|pagegroup))|t(artlink|rcmp)|et(randomseed|matrix)|ave(pos)?|hellescape)|n(ames|o(ligatures|builtintounicode|rmaldeviate))|t(ra(iler(id)?|cingfonts)|hread(margin)?|startthread|ex(banner|revision|version))|f(o(nt(expand|attr|name|objnum|size)|rcepagebox)|akespace|i(rstlineheight|le(moddate|size|dump)))|a(djust(spacing|interwordglue)|ppendkern|nnot)|un(i(queresname|formdeviate)|escapehex)|g(entounicode|lyphtounicode|amma)|l(ast(lin(edepth|k)|obj|x(form|image(colordepth|pages)?|pos)|annot|match|ypos)|i(nkmargin|teral))|e(achline(height|depth)|nd(link|thread)|scape(string|name|hex)|lapsedtime)|r(e(f(obj|x(form|image))|s(ettimer|tore)|tval)|andomseed)|x(form(name)?|image(bbox)?))|efcode|r(pcode|ightmarginkern)|l(pcode|e(ftmarginkern|tterspacefont))|tagcode|kn(b(scode|ccode)|accode)|s(tbscode|hbscode)|if(pdf(abs(num|dim)|primitive)|incsname)|quitvmode|vadjust)(?=[^a-zA-Z])'
  }

You mention in your scope name advantages that splitting them allows package authors to make dedicated rules based on the type of primitive; that is just as possible in this style, but it means everyone else reading the grammar rules can skip over the details and focus on the overall structure.

Applied to your example, I would further abstract command and parameterTokens into one metaPrimtives rule, and have that appear in the main patterns array. This way, a viewer does not need to read 20 lines that all have the same function but slightly different scopes; it can be represented by the singular metaPrimitives rule.

Basically understand and agree, but I'm afraid I still cannot understand the advantage of

patterns: [
  {
    include: "#metaPrimitives"
  }
]

repository:
  metaPrimitives:
    patterns: [
      {
        include: '#texPrimitives'
      }
      {
        include: '#pdfTexPrimitives'
      }
      {
        include: '#unsortedPrimitives'
      }
    ]
  texPrimitives:
    patterns: [
      # blah
    ]
  pdfTexPrimitives:
    patterns: [
      # blah blah
    ]
  unsortedPrimitives:
    patterns: [
      # blah blah blah
    ]

over

patterns: [
  {
    include: "#texPrimitives"
  }
  {
    include: "#pdfTexPrimitives"
  }
  {
    include: "#unsortedPrimitives"
  }
]

repository:
  texPrimitives:
    patterns: [
      # blah
    ]
  pdfTexPrimitives:
    patterns: [
      # blah blah
    ]
  unsortedPrimitives:
    patterns: [
      # blah blah blah
    ]

I prefer the latter because it's simpler while keeping enough readability and semantics.

The latter is slightly simpler in this case, and is reasonably readable, but I consider there to be two main issues with it:

  1. A contributor wishing to specifically match any primitives in another rule would have to look for all the primitive rules and include each one. Future changes to the primitives lists / structures would involve finding each instance of this and fix it. (Granted this is unlikely to occur, but is technically possible).

  2. It 'repeats' itself (in a loose sense); looking at patterns, the first rule is for primitives. The second rule is ... also primitives. The third rule is ... again primitives. Subsequent rules will also be primitives, one for each family, meaning a reader who is not interested in primitives will have to actively look for the end of the 'primitives' section of matches, which I believe lessens readability compared to a single metaPrimitives entry. (I know they're different by family, but that distinction is not relevant to this reader).

With metaPrimitives a reader would see the first rule is for primitives. The second rule is for environments. The third rule is for generic control words, etc. In this way, each rule has a different function and more 'information' is stored in a smaller space. If the reader was interested in the different familes, they could just look at the definition of metaPrimitives and it would be laid out clearly.

It may be relevant that I also keep all the meta patterns at the top of my repository, while all the atomic commands are stored at the bottom (meta vs atomic is separated by comment headings). That is, I also try to give the repository some structure to help readability there as well.

My proposal is to use a format keyword.primitive.family.extension.tex

Should we consider adding the primitive name itself to the scope? This is easily done with $n syntax (which I just discovered by reading the grammar for \section now). It would be the best fine tuning tool, but could be considered unnecessary.

BTW, I know I'm pushing for it here, and it's probably not necessary for this particular example, but really I'm just trying to show you the benefits this style can grant in general.

A better usecase example would be for the minted environment; lines 174 to 1248 (all 1074 of them!) are slight variations of the exact same pattern, and seriously make the grammar look more intimidating than it really is. If these were to be grouped together into one metaMinted pattern, and the actual rules moved to the bottom (out of sight), it would help anyone trying to read the grammar.

Just a small comment at the moment as I need to again follow and remember what we discussed before.

match any primitives in another rule

Does this mean the value for match/begin/end keys? If so, is it possible even with the metaPrimitives repository?

@yudai-nkt no, I don’t think that’s possible. The idea behind that point was if another rule only wanted to match primitives inside it’s scope, they need only import a single rule.

That was admittedly a poor reason, as I can’t think of a likely reason to need it for that specific case.

A much more compelling reason, to me at least, was that primitives should be grouped because they are all part of the same ‘idea’.

The sub grouping by which extension added them was for further grouping by ‘idea’ / related things.

As it’s even more unlikely to need one subset of primitives over another, I would be happy with a singular metaPrimitives rule (the exact name is irrelevant, so long as it’s clearly about primitives), which directly contains the individual sets of primitives as separate objects in the patterns array. These can each have a comment property, which shows the extension they’re a part of.

Closing due to inactivity.