sogaiu/tree-sitter-clojure

Potential Changes Announcements

sogaiu opened this issue · 7 comments

The purpose of this issue is to announce potential changes to tree-sitter-clojure to any interested parties [1]. Our intent is to do this by adding a comment to this issue for each such change (or set of changes).

The hope is that we might learn sooner about unanticipated consequences of such changes to prevent and/or mitigate unintended breakage. We don't know in detail how the grammar is being used, but we figured that interested parties might be able to tell us...if only they knew ahead of time :)

Having said that, we don't anticipate major changes at this point, but for reasons [2], it seems possible things might turn out otherwise.

Discussion of potential changes should take place elsewhere [3] so that this announcement issue can remain easy to parse and browse [4].

Note that this issue will be pinned and locked in support of the points above.


[1] Some specific types of "interested parties" we had in mind include:

  • Developers and maintainers of projects that directly or indirectly depend on tree-sitter-clojure. A concrete example is janet-tree-sitter which uses the grammar in testing.

  • End users of such projects. A less abstract example might be a user who is aware they are employing tree-sitter-clojure-enabled functions in an editor.

[2] Some factors that conspire against us:

  • Clojure itself doesn't really have a specification that is comparable to what exists for various other programming languages such as C, Java, JavaScript, or Python. This puts the task of understanding what is "correct" and/or "comprehensive" in a different sort of realm than for those of the aforementioned sorts of languages.

  • Tree-sitter is pre-1.0 and has some rough edges. Our belief is that the maintainers are well-intentioned, capable, and...busy. Moreover, it's not a small or easy matter they have undertaken.

  • Other unforseen things :)

[3] Specific examples of "elsewhere" include:

[4] If familiar with mailing list communication for software projects, announcement-only mailing lists might be thought of as similar.

Planned Change

Addresses

Overview

Both kwd_lit and sym_lit will now contain up to 3 new nodes:

(kwd_lit
  ns: (kwd_ns)
  delimiter: "/"
  name: (kwd_name))

(sym_lit
  ns: (sym_ns)
  delimiter: "/"
  name: (sym_name))

Only the (kwd_name) and (sym_name) are required.

sym_lit nodes can still contain metadata nodes (this is unchanged).

Remarks

We're currently planning to merge around 2023-01-05 -- that's about 2 weeks from the time of this posting.

If you are not prepared to use these changes, please consider using the commit at tag v0.0.9 or the corresponding tagged release. (Those should be functionally equivalent to the current master branch (8c23e0ec07 as of this writing).)

If the changes do cause issues, please tell us (preferrably via one of the suggested communication channels mentioned below).

Suggested Communication Channels

PR #31 has been merged.

It is available under tag v0.0.10.

If this version causes trouble for anyone feel free to open up an issue. In the meantime, v0.0.9 will always be available.

Planned Changes

The candidate changes we consider most significant include:

Please see the candidate CHANGELOG for other planned changes.

Addresses

  • #45 Version mismatch between generated and package.json info
  • #46 Does not handle metadata that is a tagged literal
  • #50 Does not handle metadata that is an evaling_lit
  • #51 sym_val_lit definition too restrictive

Discussion

#45 has to do with generating the parser C source (src/parser.c) using version
0.20.7 of the tree-sitter cli. Although the default ABI number used in generation by 0.20.7 is 14, we generated the source using --abi 13 because as far as we know, it is the most widely used ABI version. Note that this is the same number as in the current release so we don't anticipate any issues on this front.

#46, #50, and #51 are all about loosening of certain rules in the grammar. The rules in question were too strict, causing mis-recognition of legitimate code.

There are tests to show how parsing will change here:

#46 is relevant for ClojureDart as it describes a construct that appears to be used frequently in that context.

As far as we know, the constructs described in #50 and #51 are rare, but since the grammar's behavior is anticipated to change, we note these things here.

The new "what and why" document might be of interest as it attempts to spell out in detail what we are making an effort to maintain and some discussion of how things might change in the future.

The remaining planned changes for the next release have to do with other documentation and things we're doing to ease maintenance and slim down the responsibilities of the repository.

Please see the CHANGELOG for further change-related details if interested.

Remarks

We're currently planning to merge around 2023-05-07 -- that's about 2 weeks from the time of this posting.

All of the proposed changes live on the pre-0.0.12 branch so they can be examined / tested ahead of the merge.

If you are not prepared to use these changes, please consider using the commit at tag
v0.0.11
or the corresponding tagged release. (Though one commit behind, those should be functionally equivalent to the current master branch (421546c2 as of this writing) which only differs by some lines in the credits.)

If the changes do cause issues, please tell us (preferrably via one of the suggested communication channels mentioned below).

Suggested Communication Channels

@sogaiu

I've pulled down these changes and tested them as much as I can. I have no opposition to you merging these changes into the master branch..

sogaiu commented

6e41628 has been merged (fast-forwarded) and tagged as v0.0.12 [1].

If this version causes problems please open an issue or get in touch via some other means (e.g. via matrix / gitter). The previous release -- v0.0.11 -- is still available, so using that might be a work-around if v0.0.12 is problematic.

@dannyfreeman Thanks for testing things out and reviewing 👍


[1] Note that although the dates reported on github in a few places mention 2023-05-05, the actual pushing and tagging happened on 2023-05-07.

Planned Changes

Now that tree-sitter 0.20.9 has been released, we are thinking to release a new version of tree-sitter-clojure, likely to be tagged as v0.0.13.

Notable planned changes include:

  • Increase API number from 13 to 14
  • Remove Node and Rust Bindings

Addresses

Discussion

The C code generated from grammar.js will be regenerated to use API 14 instead of 13. As alluded to here, more than 50% of the (hundreds of!!) parsers we examined are using 14. Also, it is the default level used by tree-sitter the CLI program.

We will be going ahead with removing the Node and Rust bindings as mentioned in our current "what and why document" [1]. Please see that document for our reasoning.

Remarks

We're currently planning to merge around 2024-02-12 -- that's about 2 weeks from the time of this posting. Since it has been a while since a tree-sitter release though, it could be that some issues will crop up and this might influence our decision regarding when to release.

All of the proposed changes live on the pre-0.0.13 branch so they can be examined / tested ahead of the merge.

If the changes do cause issues, please tell us (preferrably via one of the suggested communication channels mentioned below).

Suggested Communication Channels


[1] With nods to Douglas Adams.

3a1ace9 has been merged (fast-forwarded) and tagged as v0.0.13.

If this version causes problems please open an issue or get in touch via some other means (e.g. via matrix / gitter). The previous release -- v0.0.12 -- is still available, so using that might be a work-around if v0.0.13 is problematic.