tact-lang/tact

Tact frontend API: Improvements for tooling

Opened this issue · 6 comments

There are a few possible ways to improve the Tact frontend API that would be helpful for tooling:

  • Improve workflow with {Node,Virtual}FileSystem API. These functions should have improved documentation, e.g. noting that they do not support relative paths. Additionally, we might consider encapsulating them making them private to make users of the Tact API only interact with absolute paths when working with compiler internals.
  • #639
  • #1041: might be useful for other tools to e.g. suppress warnings or add user-defined invariants.
  • Save types to AST or improve an access API to type information (#289).
  • Document and refactor the frontend pipeline. Currently, the way the compiler prepares to calling the precompile function is quite convoluted. All the arguments of the functions used before precompile should be documented, we should extract small methods when possible, and document them, to make that API more useful for third-parties.
  • Optionally, we should consider introducing mappings from ASTNode ID to ASTNode. This approach makes sense only if necessary, for example, when refining the typechecker. It's a common approach; for instance, rustc internals are implemented this way. For tooling it might be useful to access a node by its ID when navigating the AST structure from other IR entries.
  • Add more context to every internal Error to be thrown in compiler internals. This is crucial for debugging third-party tools.
  • Add AST iterators that perform functional map and fold over nested nodes of the AST. This will enable API users to inspect the AST in a more convenient way, for example, sorting nodes of a certain type. (Done in PR #368)
  • #686
  • Make all the definitions needed to access the AST public. We should make at least ASTStore to be public. Currently, it should be copy-pasted each time a tool is going to parse AST, since getRawAST returns that type.
  • Separate the process of parsing from creating the CompilerContext. Some tools might generate AST without parsing, so we won't need to pass mocks in openContext.
  • Create named union types for fields of AST nodes (see: #314 (comment))
  • Refactoring: Extract methods from the build function (src/pipeline/build.ts) to make it more modular. We need to separate the build functionality from CLI parsing and use different methods to create context, compile, and precompile. This is important to implement in order to enable third-party tools to hook into the compilation pipeline in the most flexible way. Perhaps, the best way to achieve this functionality is to create the Builder class with public methods defining the pipeline.
  • #645
  • #646
  • #648
  • #776

These are my points after working with the Tact API for a while. Feel free to discuss and extend them and create separate issues from each checkbox when working on these.

Save types to AST or improve an access API to type information.

This would be insanely useful in tools like language server and such!

In general, I agree with all those points, but implementation of some may just require us to make our own lexer→parser→semantic analysis pipeline, suitable for compiler AND external tooling depending on various steps in it. I'd say this depends on #286, but I may be wrong and we could pull off such feat without compromising the Ohm's toolkit.

I would also suggest creating named union types for AST entries used in fields. For example:

export type ASTContract = {
    kind: 'def_contract';
    origin: TypeOrigin;
    id: number;
    name: string;
    traits: ASTString[];
    attributes: ASTContractAttribute[];
    declarations: (ASTField | ASTFunction | ASTInitFunction | ASTReceive | ASTConstant)[];
    ref: ASTRef;
};

Could be rewritten as:

export type ASTContractDeclaration = (ASTField | ASTFunction | ASTInitFunction | ASTReceive | ASTConstant);
export type ASTContract = {
    kind: 'def_contract';
    origin: TypeOrigin;
    id: number;
    name: string;
    traits: ASTString[];
    attributes: ASTContractAttribute[];
    declarations: ASTContractDeclaration[];
    ref: ASTRef;
};

This will simplify the life of tooling developers by enabling them to reuse these type definitions from the compiler. Otherwise, I find myself copy-pasting these entries in my projects. Here is an example of a function with such a signature implemented in the static analyzer internals:

function getMethodInfo(
    decl: ASTField | ASTFunction | ASTInitFunction | ASTReceive | ASTConstant,
  ): [string | undefined, FunctionKind | undefined] {

Added three more points:

Add more context to every internal Error to be thrown in compiler internals. This is crucial for debugging third-party tools.
Add AST iterators that perform functional map and fold over nested nodes of the AST. This will enable API users to inspect the AST in a more convenient way, for example, sorting nodes of a certain type.
Add an API that provides equivalence checks between AST nodes of the same type. This is needed, for example, in #335 to implement unit tests.

Added while working on tests for #559:

  • Refactoring: Extract methods from the build function (src/pipeline/build.ts) to make it more modular. We need to separate the build functionality from CLI parsing and use different methods to create context, compile, and precompile. This is important to implement in order to enable third-party tools to hook into the compilation pipeline in the most flexible way. Perhaps, the best way to achieve this functionality is to create the Builder class with public methods defining the pipeline.

Every point here makes a lot of sense, except for

Optionally, we should consider introducing mappings from ASTNode ID to ASTNode

In fact, we should remove id from AST nodes. They are just disguised references, and there is neither GC nor type safety to ensure AST ids stored elsewhere won't become dangling references.

While we could patch createNode to ensure all the ids would be in that Map, we wouldn't ever know they should have been removed from that map. Even though it would be possible to find any node by its id, we'd end up performing some actions on nodes that aren't even relevant anymore.

jubnzv commented

They are just disguised references, and there is neither GC nor type safety to ensure AST ids stored elsewhere won't become dangling references.

I would argue against any idea of mutating the AST. If we need to transform AST, we should create a second AST to ensure a clean design for both the compiler and API users. Otherwise, the AST should be available at all stages of compilation since we need to access that information from different places; therefore, it will never be GCed.

Additionally, having unique IDs is essential for maintaining symbol tables that are useful at different stages of compilation, particularly for analysis and accessing the AST from other IRs what is used in Misti.