Provide a tool to compare two ATD files for compatibility
mjambon opened this issue · 7 comments
@jbergler wrote:
To detect breaking changes [in an ATD file] I currently do something like this and manually review the changes
~/co/semgrep-interfaces % git diff v1.17.0..main *.atd
But these diffs are huge, so I don't really get much confidence out of asking a human to review the changes.
My past experience tells me the only way we get high confidence outcomes is to automate the review for these kinds of nuanced details.
It's ok if we want to say we don't think the effort to get there is required. Before we do that I want to brainstorm how we might achieve that.
In protobuf land, there are cli tools that compare two proto files and print all the breaking changes. Could you have a guess at how much work it would be to build something similar for atd?
I think it would be a relatively low effort to produce a tool (atddiff
?) that compares two ATD files and reports differences:
- incompatibilities in one direction (left: writer, right: reader)
- incompatibilities in the other direction (left: reader, right: writer)
<
...>
annotations that changed (but without trying to interpret them)- bonus: track renames and ignore name changes that don't affect data compatibility
For example, let's compare two files, Interface_v1.atd
against Interface_v2.atd
:
(* Interface, version 1 *)
type fruit = [ Kiwi | Strawberry ]
type veggie = [ Potato | Eggplant ]
type basket = {
owner: string;
fruits: fruit list;
veggies: veggie list;
?notes: string option;
}
(* Interface, version 2 *)
type fruit = [ Kiwi | Strawberry | Plum ]
type vegetable = [ Eggplant | Potato ]
type basket = {
fruits: fruit list;
~veggies: vegetable list;
notes: string;
price: int;
}
type drink = [ Water | Unicorn_blood ]
$ atddiff Interface_v1.atd Interface_v2.atd
Backward incompatibilities result in the following problems:
- cannot read old data using new implementation
- cannot read responses from old server in new client
- cannot read requests from old client in new server
Forward incompatibilities result in the following problems:
- cannot read new data using old implementation
- cannot read responses from new server in old client
- cannot read requests from new client in old server
Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Backward incompatibility: the 'notes' field is required by newer implementations but used to be optional.
Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Backward incompatibility: the new 'price' field is required by newer implementations.
Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the 'owner' field was removed but is still required by older implementations.
Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the new 'Plum' case is unsupported by older implementations.
Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the 'veggies' field is required by older implementations but is now optional.
Regarding types whose name changes, I think this is generally not a problem and we can ignore these changes e.g. veggie
→ vegetable
. However, name changes for root types (types that are not referenced by other types) may have to be reported because it's possible that a root type gets both renamed and changes its structure. In such a case, we don't know for sure that it's the same type and we can't report incompatibilities confidently.
Update: changed language from left-to-right/right-to-left to backward/forward. There's a 50% chance I got this right.
Thanks @mjambon - this looks great. A couple of thoughts:
First, I'd love a --output json
or similar so that we can get a machine parseable representation of these, with your example, maybe something like
$ atddiff --output=json Interface_v1.atd Interface_v2.atd
[
{
"location": { "old": "Interface_v1.atd:123", "new": "Interface_v2.atd:321" },
"field": "my_type.notes",
"change": { "old": "optional", "new": "required" },
"kind": "incompatible backwards"
},
...
]
Second, if we wanted to use this as a commit hook, we'd probably need to be able to specify the constraints to enforce.
Until atd has a "required on create, optional on parse" I think that means we need to be able to say that we want to allow the 'soft' transition (optional -> required) as a warning, while having the 'hard' transition (missing to required) be an error.
In the JSON version above that might mean a pair of fields for type and severity instead of just kind
What about having the MVP only display a diff of the ASTs (no checking) at the type-level. Something like
$ atddiff --output=json Interface_v1.atd Interface_v2.atd
[
{
"type": "basket",
"location": { "old": "Interface_v1.atd:123", "new": "Interface_v2.atd:321" },
"fields": [
{
"old": {"name": "owner", "type": "string", "required": true},
"new": {"removed": true}
},
{
"old": {"name": "veggies", "type": "veggie list", "required": true},
"new": {"name": "veggies", "type": "vegetable list", "required": false}
},
...
]
Tentative design for the command-line interface:
atddiff OLD_ATD_FILE NEW_ATD_FILE [OPTIONS]
Exit status:
- 0 if there's nothing to report
- 1 if anything is being reported
- 2 if anything goes wrong
Options:
--help
...
--output {text|json|quiet}
Specify the output format. Default: 'text'
--report {all|backward|forward}
Specify which kind of incompatibilities to report. Default: 'all'
[explain]
--report-type-name-changes
Report type name changes.
[explain]
--no-error
Exit with success status (0) instead of 1 when something is being reported.
GNU diff and git diff
have a lot of nice formatting options that ideally we wouldn't have to reimplement. I'm not sure about the best approach here.
Update:
git difftool
(man git-difftool
) provides a convenient way to select and compare two versions of the same file(s). Here, we tell it to use the plain diff
command without options. It gets invoked as diff OLD_FILE NEW_FILE
.
$ git difftool -x diff HEAD^ cli/Makefile
Viewing (1/1): 'cli/Makefile'
Launch 'diff' [Y/n]?
99c99
< PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -m 'osempass and not todo' tests/e2e
---
> PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -m 'osempass and not todo' tests/e2e
103c103,105
< PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -v --no-summary -m 'not no_semgrep_cli and not todo and not osempass' tests/e2e
---
> PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -v --no-summary \
> -m 'not no_semgrep_cli and not todo and not osempass' tests/e2e \
> | (grep PASSED || echo 'Found no new passing tests')
@zyannes imo, an MVP is this:
$ diff -u <(atdcat old.atd) <(atdcat new.atd)
atdcat
removes the comments are reformats the type definitions, resulting in good diffs (except if the definitions are reordered). For better results, we could add an option to atdcat
to sort the type definitions alphabetically since their order doesn't matter.
But yeah, the issue is probably to explain the implications of the diffs e.g. explain what the following will break:
$ diff -u old.atd new.atd
--- old.atd 2023-10-02 12:33:41.470105275 -0700
+++ new.atd 2023-10-02 12:33:54.258098342 -0700
@@ -1,3 +1,3 @@
type a = {
- things: int list;
+ ~things: int list;
}
Looks great! Let's do it!