proposal: improved compiler test suite (compiler errors, run and compare, translate-c, etc.)
mitchellh opened this issue · 18 comments
Requirement for #89
Proposal
An improved test harness for the compiler that supports the following:
- Utilizes a directory structure of real source files (Zig, C, and assembly) to define compiler test cases.
- Test settings and assertions are configured using a manifest file. The manifest file can be embedded in the first (top) comment of a Zig file or it can exist alongside a test file.
- Multiple test types:
run
- Build and run to completion and compare outputerror
- Build and expect specific error outputtranslate-c
- Build a “.c” file and assert an expected “zig” file is produced. The Zig file can further optionally define a subsequent test to run (i.e.run
) so you can translate C then optionally assert the proper behavior also happens.incremental
- Build and incrementally update a program. Each incremental step is itself a test case so that we can support incremental updates introducing errors, and a subsequent update fixing the errors. [stage2-only]cli
- Run a specificzig
CLI command (typically,zig build test
) to test the Zig CLI or Zig build machinery.
- Concurrency. Test cases will run on a thread pool similar to compilation.
- Test filtering by name, test type, or backend.
- Test dependencies. Tests can note that they are dependent on another test in order to optimize parallelism or build order.
- Adding or modifying tests will not trigger a full rebuild of the compiler.
This will be built on the existing test-stage2
harness (src/test.zig
) and won’t a totally new implementation. This is almost purely iteratively improving the existing harness rather than writing anything new.
The tests will continue to invoked using zig build test-stage2
or similar targets.
NOTE: Some of these are not new features, but are features that are currently unimplemented in the stage2 test harness or just need slight improvement in the stage2 test harness.
Backwards Compatibility
A limited amount of backwards compatibility with the existing compiler test API will be retained so we can incrementally introduce these test cases. In some cases, certain test types such as stack trace tests and certain native backend tests will continue to use the Zig API and will not support the file-based definition format. Therefore, we will continue to support the necessary subset of the Zig API.
Note that changes to these tests will not gain the benefits of the above harness. For example, changes will trigger a full rebuild. They won’t be parallelized as well, etc. However, these tests tend to have very few cases and/or are updated very infrequently.
Test Definitions
Tests are now defined using files within a directory structure rather than a Zig API. This has benefits:
- Adding, changing, or removing a test case does not cause the full test harness (including the compiler) to rebuild.
- Adding, changing, or removing test cases is a lot more contributor friendly.
Directory Structure
The directory structure has no meaning to the test definitions. Any test can go into any directory, and multiple tests can go into a single directory. The test harness determines test mode and settings using the manifest (defined below). Why? Originally, Andrew and I felt the directory structure should imply some stuff such as backend (stage1 vs stage2). But we pulled back on this initial plan because many tests should behave the same on stage1 AND stage2, and incremental tests want to test multiple test modes.
However, we may wish to enforce directory idioms, such as all stage1-exclusive tests going into a tests/stage1
directory. The test harness itself will not care about these details.
The name is determined by the filename. It is an error to have duplicate test names. Therefore, two tests named foo.zig
and foo.S
would produce an error since they are both named “foo” despite having different extensions.
Test Manifests
Manifest Syntax
The first line of the manifest is the test type.
Subsequent non-empty lines are key=value
configuration for that test type.
An empty line followed by additional data is “trailing” configuration that is dependent on the test type. For example, for error
tests, it defines the expected error output to match.
error
backend=stage1,stage2
output_mode=exe
:3:19: error: foo
run
I am expected stdout! Hello!
cli
build test
Manifest Location
The test manifest can be either embedded or adjacent. Only Zig and C files support embedded manifests.
A manifest must have a test. However, a test does not require a manifest. There a handful of implicit manifest conditions defined in a later section.
For embedded manifests, the manifest is the last comment in a Zig or C file. The comment is not stripped for test types that compile the source. Example, filename my_test.zig
:
// other comments, not the manifest
export fn main() void {
// code
}
// error
// output_mode=lib
//
// 7:2: error: bar
Manifests can also be defined adjacent to a test. In this case, the manifest file must have the same filename as the test file and end in the .manifest
suffix. Adjacent manifests are supported specifically for non-Zig tests and multi-file or incremental tests.
Example:
my_test.zig
:
export fn main() void {
// code
}
my_test.manifest
:
error
output_mode=lib
7:2: error: bar
Implicit Manifests
Whilst a manifest requires a test, a test does not require a manifest. If a manifest is not found for a test file, a default run
manifest is assumed (must compile and exit with exit code 0).
Incremental Tests
Tests that test that the stage2 compiler can do incremental compilation have an additional special filename format: each incremental compilation step is suffixed numerically to denote ordering.
For example, foo_0.zig
, foo_1.zig
would denote that the test named “foo” is incremental and will run test case 0 followed by test case 1 within the same stage2 compilation context. Each individual test case (foo_0
, foo_1
) can define their own test mode. This enables us to test that the incremental compiler handles errors after succeses and so on.
For naming, the following test names will be created and filterable:
- “foo” - This will run the full
foo
incremental sequence and all cases within foo. - “foo_1” - This will run all the incremental steps up to and including step 1, but no further.
It is an error for the following conditions:
- Incremental test ordering cannot have holes. You cannot have
foo_0
andfoo_2
without afoo_1
. Likewise, you cannot have afoo_1
without afoo_0
. - Incremental tests cannot conflict with non-incremental tests. You cannot have a
foo_0.zig
and afoo.zig
. This is an error since it would create a duplicate test name “foo” whilst also being unclear to contributors. - Incremental tests must start with a
_0
suffix. The_0
is not implied. For examplefoo.zig
andfoo_1.zig
do not comprise an incremental test case, the harness will see_1
as an incremental case with a hole missing_0
and error.
Test Execution and Ordering
The test harness will do the following at a very high-level:
- Collect and queue all tests by recursively traversing the test case directory (filtering happens here).
- Execute the tests using a threadpool in the order they were traversed.
There is no guarantee on test ordering.
Test failures are output to the console immediately upon happening. Failures are not buffered in memory except for the minimal amount of memory needed to build the error message. In the case of OOM, a message will be outputted with the test name that failed but no further information (if it reached this point, the filename is already in-memory).
Backends
Backends are specified using the backends
manifest configuration option that is available for all test types. This is a comma-separated list of the backends to test against. The backend can be prefixed with a !
to test all backends but exclude that specific backend.
Supported backends:
stage1
stage2
- Stage2 using the default backend (LLVM or native).
Native Backends
There are handful of tests for the native backends. These are primarily testing things that behavior tests test. According to Andrew, they’re historically only there for backends before they are able to execute behavior tests using at the least the simplified test runner. Therefore, the plan for now is to remove these since they’re tested via behavior tests, or keep them using the old Zig API.
Test Filtering
The -Dtest-filter
build option will be used to filter tests by name.
Different test targets from the zig build
command will be used to filter by test types, i.e. error tests vs run tests.
Test Types
Run Tests
Run tests build an exe and run it to completion. Any exit code other than zero is a test failure. Additionally, output can be specified and it will be asserted to match the output on stdout byte for byte.
The following type-specific manifest options are supported:
translate-c
(default: false) - If non-empty, C files will be processed throughtranslate-c
prior to being run. Without this, any C files will use the Zig compiler frontend for C (clang at the time of writing).
Manifest trailing data is the stdout data to assert. stderr is not matched.
Run tests use the suffix of the associated test file to perform the proper build. For example, .zig
files go through the normal Zig compilation pipeline. .S
files are assembled and linked.
Example, a run test that just asserts exit code 0 and ignores all output:
//! run
Example, a run test that asserts output in addition to the exit code being 0:
//! run
//!
//! Hello, World!
//!
Error Tests
Error tests build an exe, obj, or lib and assert that specific errors exist. This only tests errors that happen during compilation and not during runtime. The built exe (if the output mode is exe) is not executed. Runtime errors should be checked using the run
test type with a panic handler.
The compiler is expected to error. If the compiler is subprocessed (i.e. testing stage1), then a non-zero exit code will be expected.
The following type-specific manifest options are supported:
output_mode
(default:obj
) - one ofexe
,lib
,obj
. This is the compilation output mode.is_test
(default: false) - non-empty specifies this hasis_test
set on the compilation mode.
The manifest trailing data is the error output to assert. One error is expected per line, and will be matched using a string subset match. The subset match lets you omit things like filename. Non-empty trailing data is required.
Example:
// error
//
// :1:11: error: use of undeclared identifier 'B'
Example that specifies an output mode:
// error
// output_mode=lib
//
// :1:11: error: use of undeclared identifier 'B'
Example that tests errors in test mode:
// error
// is_test=1
//
// :1:11: error: use of undeclared identifier 'B'
Translate C Tests
translate-c
tests test that a C file is translated into a specific Zig file by testing one or more substring matches against the resulting Zig source after translation.
This test has no type-specific configuration.
The manifest trailing data are the one or more sets of strings to compare against. Each set is separated by a newline followed by a comma.
Example of a single match:
// translate-c
//
//pub const struct_Color = extern struct {
// r: u8,
// g: u8,
// b: u8,
//};
Example of multiple matches, in which case every match must found:
// translate-c
//
//pub const struct_Color = extern struct {
// r: u8,
// g: u8,
// b: u8,
//};
//,
//const struct_unnamed_2 = extern struct {};
Deprecated or Ignored Test Types
The following test types will be deprecated since they are just a special case of the file-defineable test types. In all cases, these tests are simply migrated, not lost.
assemble_and_link
- This is just therun
test case above where the run input supports assembly files.runtime_safety
- This is a special case ofrun
.
The following test types are ignored and kept as-is. They are tests that aren’t frequently updated and don’t have many cases:
stack_traces
standalone
- This looks like something we can convert to this new test suite later, but is ignored for now due to some complexities and details that we should probably iron out in a dedicated issue.- native-backend compiler error and run tests, more details on this in the "backends" section in this proposal
Abandoned Ideas
The ideas below were initially discussed or existed but ultimately abandoned:
Explicit Test Dependencies
One configuration setting that can be specified in the manifest for all test types is an after
setting. This is a comma-separated list of tests that you want to be executed first before your test.
This is optional. Tests should not have side effects and therefore the ordreing should not matter. However, this feature can be used to point to other tests that test functionality that is used but not tested by the current test, therefore optimizing test suite speed.
CLI Tests
CLI tests execute specific zig
CLI commands and assert exit code zero.
CLI tests have no type-specific manifest options.
The manifest trailing data specifies one command to run per line (all assumed to be subcommands of zig
). This lets you test multiple targets seprately. They are run in order. If no trailing data is specified, zig build test
is run.
CLI tests can run against any Zig binary, so these are a good way to test consistent behavior against both a stage1 and stage2 binary build.
Example, with a custom target:
cli
build test-custom-target
Example, running a file:
cli
run hello.zig
Example, running and building:
cli
run hello.zig
build
A few notes:
translate-c
- Build a “.c” file and assert an expected “zig” file is produced. The Zig file can further optionally define a subsequent test to run (i.e.run
) so you can translate C then optionally assert the proper behavior also happens.
Note that for run-translated-c
tests, the output zig code is not checked to match any expected text, it is only executed and checked for behavior. The existing description of translate-c tests in this proposal does not have a way to encode this. All translate-c tests only want to do one or the other- run or compare the translation, never both.
Test dependencies. Tests can note that they are dependent on another test in order to optimize parallelism or build order.
I don't think we should have any notion of test dependencies. All tests should be able to be independently run from each other.
I see that you addressed this in more detail in the "Explicit Test Dependencies" section. If you feel strongly that this is useful and helpful then I'm fine with going with it. My only resistance is that it's a lot of meta work for dubious benefits. I'm just imagining myself reviewing a big PR that tries to add test dependencies all over the place and then a bunch of people, who contribute suspiciously seldomly, bikeshedding over it.
backend=stage1,stage2
Failing backends should be excluded; the default should be all backends. This follows the philosophy of the std.builtin.CompilerBackend
enum:
Lines 668 to 721 in 2af6971
The
-Dtest-filter
build option will be used to filter tests by name (this already exists, but is unimplemented).
It exists, and it is implemented everywhere except for the part where the stage2 compiler does not support --test-filter
for zig test
.
output_mode
(default:exe
) - one ofexe
,lib
,obj
. This is the compilation output mode.
Default output mode for error tests should be obj
because it drags in less std lib code and does not create any sub-compilations.
The manifest trailing data is the error output to assert.
The compile errors should go at the end of the file so that adding/removing lines in the expected errors does not shift the other error lines up and down. To keep it consistent I suggest that the entire manifest is at the very end of the file and consists of all the line comments in a row at the end of the file. This also frees us up to test doc comments such as the //!
comments at the top of a file, which have some semantic meaning, unlike line comments.
CLI Tests
If we want to be able to move our existing CLI tests to this mechanism then we need a way to check the following things:
- after running
zig init-exe
doeszig build run
work and produce the expected output? - after running
zig init-lib
doeszig build test
work and produce the expected output? - after executing the zig command, does there exist the expected .s file (the godbolt use case)
- after running the zig command which is expected to produce an error, did we get the correct error message?
- after running
zig fmt
did it do UTF-16 decoding and did it find and reformat the test file as expected?
See test/cli.zig for more details. All these tests currently have bespoke logic for testing these parts of the CLI.
Perhaps the best option would be to leave CLI testing alone, since the common case of exit 0 is already handled better through the other test mechanisms described in this proposal.
Note that for
run-translated-c
tests, the output zig code is not checked to match any expected text, it is only executed and checked for behavior. The existing description of translate-c tests in this proposal does not have a way to encode this. All translate-c tests only want to do one or the other- run or compare the translation, never both.
I think a valid solution for this is to support run
for c
files. The implicit behavior of a .c
file is to run translate-c, but you can create an explicit manifest that does run
. If the file ends in .c
then it is translated first, but if the mode is run
, the output is not compared to anything.
I don't think we should have any notion of test dependencies. All tests should be able to be independently run from each other.
Let's drop it unless @matu3ba can convince you otherwise!
Failing backends should be excluded; the default should be all backends. This follows the philosophy of the
std.builtin.CompilerBackend
enum.
How about this wording:
The backend
defaults to all backends, but can be explicitly specified to override that behavior.
The one thing I'm thinking is for compiler errors, we've already changed the wording for example of out of bounds messages to be much more explicit in stage2, so we don't want those to always run all backends.
Default output mode for error tests should be
obj
because it drags in less std lib code and does not create any sub-compilations.
Ack.
The compile errors should go at the end of the file so that adding/removing lines in the expected errors does not shift the other error lines up and down. To keep it consistent I suggest that the entire manifest is at the very end of the file and consists of all the line comments in a row at the end of the file. This also frees us up to test doc comments such as the
//!
comments at the top of a file, which have some semantic meaning, unlike line comments.
Greatttttt point. Updating this now.
CLI Tests
If we want to be able to move our existing CLI tests to this mechanism then we need a way to check the following things:
- after running
zig init-exe
doeszig build run
work and produce the expected output?- after running
zig init-lib
doeszig build test
work and produce the expected output?- after executing the zig command, does there exist the expected .s file (the godbolt use case)
- after running the zig command which is expected to produce an error, did we get the correct error message?
- after running
zig fmt
did it do UTF-16 decoding and did it find and reformat the test file as expected?See test/cli.zig for more details. All these tests currently have bespoke logic for testing these parts of the CLI.
Perhaps the best option would be to leave CLI testing alone, since the common case of exit 0 is already handled better through the other test mechanisms described in this proposal.
Agreed, let's leave them alone for now, and then reconsider later.
The compile errors should go at the end of the file so that adding/removing lines in the expected errors does not shift the other error lines up and down. To keep it consistent I suggest that the entire manifest is at the very end of the file and consists of all the line comments in a row at the end of the file. This also frees us up to test doc comments such as the //! comments at the top of a file, which have some semantic meaning, unlike line comments.
Just as a heads up, the way this is handled in #11284 right now is that the container doc comments are simply not considered part of the source, so they don't affect line numbers.
I have no problem with either direction - It was just the choice I made at the time to get things up and running
I have no problem with either direction - It was just the choice I made at the time to get things up and running
It'd be much easier to not modify files, because then we can pass them as-is into the compiler, specifically around test modes that require subprocessing to zig
. We can avoid a lot of machinery around creating temporary files and so on in that case.
If you have problem either way, I'd say let's move them to the end and make them normal comments (//
not //!
).
I don't think we should have any notion of test dependencies. All tests should be able to be independently run from each other.
Let's drop it unless @matu3ba can convince you otherwise!
I am fine with either statement. My intention was to clarify the situation.
Concurrency. Test cases will run on a thread pool similar to compilation.
This does not explain, if and (if yes) how many processes are spawn to isolate error sources on usage of a thread pool.
Probably it should be mentioned that single-process architectures will default to 1 thread to eliminate side effects of threads.
Or if, and if yes, how side effects of multi-threading with memory problems are mitigated (heisenbugs, error source identification etc).
I am asking to explicitly write the assumptions+assertions regarding parallelism/concurrency here.
There is no guarantee on test ordering.
The test output should be then defined. The error trace is buffered somewhere with allocations, so the behavior for OOM should be specified.
Question on embedded manifests:
This sounds very much like an eventual libstd feature or people will at least push for it. What is the strategy to deal with that and can you clarify on that to prevent future churn on discussing this feature?
Error tests build an exe, obj, or lib and assert that specific errors exist.
Rephrase to mention that the error may exist during compiling or running the binary.
For naming, the following test names will be created and filterable:
Edge cases: Is calling a test foo_0.zig
then forbidden or only, if foo.zig
exists? Do incremental tests start with foo_0.zig
?
exit code
Please specify what makes success and error for completeness+consistency.
Question on embedded manifests:
This sounds very much like an eventual libstd feature or people will at least push for it. What is the strategy to deal with that and can you clarify on that to prevent future churn on discussing this feature?
I'd say that this is a bespoke format for a bespoke use case and it doesn't belong as-is in libstd since it isn't a general purpose thing. It also isn't a particularly complex task. The tokenizer/parser is in libstd, you could build it yourself. That would be my opinion, anyways. Other projects (such as Go) do not expose their compiler test "manifest"-equivalent parsing into the standard library and have historically said the same thing: you have the parser available, build it yourself.
The test output should be then defined. The error trace is buffered somewhere with allocations, so the behavior for OOM should be specified.
Good point. I would recommend outputting failures as soon as they exist so we don't have to deal with buffering and memory allocations for that (beyond the buffering necessary for building up the error message/trace for that single failure). What does the normal test runner promise?
I agree specifying the OOM behavior here is a good idea.
This does not explain, if and (if yes) how many processes are spawn to isolate error sources on usage of a thread pool.
Probably it should be mentioned that single-process architectures will default to 1 thread to eliminate side effects of threads.
Or if, and if yes, how side effects of multi-threading with memory problems are mitigated (heisenbugs, error source identification etc).
I am asking to explicitly write the assumptions+assertions regarding parallelism/concurrency here.
Defer to @andrewrk for clarification here, but my interpretation was that we'd use multiple threads in a single process. These tests aren't expected to have any side effects (and it is a bug if there are). The compiler itself is the same way. If there are heisenbugs/flakey tests, we chase them down as we normally do with that class of bug. A -fsingle-threaded
flag can be introduced to at least help check that its related to concurrency.
Error tests build an exe, obj, or lib and assert that specific errors exist.
Rephrase to mention that the error may exist during compiling or running the binary.
👍
For naming, the following test names will be created and filterable:
Edge cases: Is calling a test
foo_0.zig
then forbidden or only, iffoo.zig
exists? Do incremental tests start withfoo_0.zig
?
Good clarifications, will make some edits. My proposal:
foo_0.zig
is allowed so long as foo.zig
does not exist, and having a single-sequence incremental test is completely fine. Incremental tests must start with _0
-suffix. Having only foo_1
will see "0" as "missing" and holes are expressly not allowed in incremental tests.
My reasoning is just to avoid two ways to do the same thing. There is only one way to do incremental tests and that is having an explicit numeric suffix.
exit code
Please specify what makes success and error for completeness+consistency.
I did in the first sentence, but I'll repeat it for consistency and being explicit.
I did in the first sentence, but I'll repeat it for consistency and being explicit.
Any exit code other than zero is a test failure.
is mentioned in Run tests
, but not in Error tests
.
translate-c tests test that a C file is translated into a specific Zig file, byte-for-byte.
The current translate-c tests do substring matching when checking for specific translations, because some predefined macros and constants can vary depending on the target.
For example with -target i386-windows-gnu
:
pub const __LONG_MAX__ = @as(c_long, 2147483647);
But with -target aarch64-linux-gnu
:
pub const __LONG_MAX__ = @import("std").zig.c_translation.promoteIntLiteral(c_long, 9223372036854775807, .decimal);
Even if we force the tests to always use a specific target, the double underscore namespace is reserved for the compiler (clang in this case) which means an llvm version update could add/remove those types of constants, requiring all the tests to be updated even if nothing really changed.
Thanks @ehaas, I missed that bit of nuance.
I think in that case then, we should cleanly separate translate-c
into requiring its own manifest that probably contains the same assertions as https://github.com/ziglang/zig/blob/master/test/translate_c.zig For the tests that use the dynamic values (like default_enum_type
), they can keep using the Zig API since that is going to stick around. And for run-translated-c
tests, the existing change I made that they should just be run
types still works.
For the others, they should specify a manifest with the expected output and it should follow the same substring matching semantics as today.
Thoughts on that?
Question then, since //
is a valid comment syntax in C, should we support the embedded manifest syntax in C? Or should we require an adjacent manifest for each of them?
I think a valid solution for this is to support
run
forc
files. The implicit behavior of a.c
file is to run translate-c, but you can create an explicit manifest that doesrun
. If the file ends in.c
then it is translated first, but if the mode isrun
, the output is not compared to anything.
Note that when you do zig run hello.c -lc
what is actually happening is the C code is getting compiled and executed, not translated. Zig supports compiling .c files so I don't see any reason to have .c files be different than .zig files, especially as we transition to having aro as a C frontend (#10795). However even with the current Clang frontend for .c source code, I think "translate-c" should be a distinct test category from traditional C source code compilation, error checking, and execution.
Or we can target a merge as-is and re-align with this proposal later - I'll defer to @andrewrk in the review on that one
I'm happy to do an early merge - I didn't look at it yet because you said you wanted to push some more changes today, something about directories.
Generally I think this proposal can and should be rolled out with incremental enhancements such as the one in your PR.
Defer to @andrewrk for clarification here, but my interpretation was that we'd use multiple threads in a single process.
👍
A
-fsingle-threaded
flag can be introduced to at least help check that its related to concurrency.
This already exists and is already observed by the test harness :-)
Question then, since
//
is a valid comment syntax in C, should we support the embedded manifest syntax in C? Or should we require an adjacent manifest for each of them?
It seems reasonable to have the same manifest syntax in both .c source files and .zig source files to me.
Question then, since
//
is a valid comment syntax in C, should we support the embedded manifest syntax in C? Or should we require an adjacent manifest for each of them?
I think embedding the manifest in C would be fine. One thing to note is that it's not always matching against one string; sometimes the test runner searches for multiple strings in the output so we'd need a good syntax for that. Example here
I think embedding the manifest in C would be fine. One thing to note is that it's not always matching against one string; sometimes the test runner searches for multiple strings in the output so we'd need a good syntax for that. Example here
Got it, I've updated the proposal to support that. The manifest would look like the following. Each case would be separated by a line with only ,
. I checked all translate-c tests and none expect this output, and I don't think any C code would be translated into that on a single line.
// translate-c
//
//pub const struct_Color = extern struct {
// r: u8,
// g: u8,
// b: u8,
//};
//,
//const struct_unnamed_2 = extern struct {};
Note that when you do zig run hello.c -lc what is actually happening is the C code is getting compiled and executed, not translated. Zig supports compiling .c files so I don't see any reason to have .c files be different than .zig files, especially as we transition to having aro as a C frontend (#10795). However even with the current Clang frontend for .c source code, I think "translate-c" should be a distinct test category from traditional C source code compilation, error checking, and execution.
@andrewrk Yep, I've split out translate-c
to be a distinct test category. For the run
test type, I also added a translate-c
configuration that can be set to non-empty to force C files to be run through translate-c
prior to being run. This is effectively the run-translated-c
test type.
It seems reasonable to have the same manifest syntax in both .c source files and .zig source files to me.
Done.
Now that #11530 is merged, let's break this proposal into separate, independently completable issues for each task that is left.
Alright I'm closing this. Anyone is free to extract the unsolved issues out of this proposal into separately completable issues 👍