tlc-pack/relax

[Discuss][Testing] Avoiding coupling with different features of the normalizer/parser in the unit tests

slyubomirsky opened this issue · 4 comments

From early on, Relax has adopted ANF* as a standard representation for passes and much of the compiler infrastructure is built around assuming that passes will be normalized into ANF. This is useful for writing passes, but the normalization pass (in BlockBuilder::ExprNormalizer) has grown to cover a lot of functionality, including also filling in checked types and possibly shape expressions, some of which also occurs in expression constructors. Additionally, the parser relies on the normalizer to fill in checked types and shapes as well.

Bundling functionality intended for multiple purposes in the normalizer can make it harder to test certain components, especially the normalizer itself. It is hard to construct nontrivial examples without using the parser; in particular, the checked_type field can't be populated from Python... now that the well-formed check is looking for that, it is more-or-less impossible to construct a well-formed program using manual ASTs in Python. This means that changes to the parser or normalizer may affect lots of tests of other, likely unrelated functionality, since many of those would implicitly depend on the parser (which in turn depends on the normalizer). Additionally, it also makes it more difficult to determine, e.g., that type-checking behavior is proceeding as we would expect, since the logic for it is federated in so many different places.

It is reasonable to have a convenient normalization pass that is intended to be used in between other passes to ensure the AST will be well-formed, but for testing compiler internals and long-term maintainability, I think it would make more sense to divide up the different steps in normalization (especially type and shape inference) and have ways to test these individually, in addition to being able to parse un-normalized ASTs for testing purposes.

I would be curious to hear others' thoughts on these issues. I think it is important to test logic like ANF normalization, type checking, etc. very thoroughly, since other passes rely on them, so it would be most useful to have ways to do it.


*Other than keeping tuples inline, see #283.

Agree that it would be good to unit-test type shape inference of each operator.

One of the main catch here that normalization can be context dependent, where the context (surrounding IRModule, scope, defined function signature) may be necessary for deduction functions.

One idea for unit-testing type/shape inference is to start with BlockerBuilder context populated(with an existing IRModule), then construct singleton ASTs (such as Call) that is not part of the module, then call builder.normalize to normalize the expression. We can additionally expose begin_scope and end_scope for testing purposes in python so such singleton construction of inference steps can be tested for a given operator in different scenarios.

Effectively normalize becomes the single point of entry for testing different AST behaviors(that are constructed through IR builder like APIs or Raw AST construction APIs).

I like the idea of exposing more control of the block builder context for testing purposes. It would be nice to have some simple switches and accessors to either set up the state as we'd like it or to turn off normalization for some tests.

We discussed this issue at the Dec. 13 Relax community meeting. We came to the conclusion that we could avoid unnecessary coupling in unit tests by manually constructing ASTs instead of using the parser. Note that the functions _update_shape() and _update_type() allow for setting shape_ and checked_type in Python, so these features could be tested that way as well.

I will leave this issue open for now (and change the title) to reflect that we should refactor our existing tests and likely add more tests to test type-checking and shape-checking behavior.