Clarification of the unified output format
morozov opened this issue · 19 comments
The unified output format looks a bit different from the one implemented in GNU diff. Consider the two files:
↪ cat a.txt
1
2
3
4
5
6
7
8
↪ cat b.txt
1
2
3
4
5
6'
7
8
If compared by GNU diff, the comparison result is the following:
--- a.txt 2017-10-11 12:40:59.304512847 -0700
+++ b.txt 2017-10-11 12:41:14.776322246 -0700
@@ -3,6 +3,6 @@
3
4
5
-6
+6'
7
8
The changed line is surrounded by unchanged lines (3 at most, by default) to represent the context.
If compared by the Differ
, the result is the following:
--- Original
+++ New
@@ @@
1
2
3
4
5
-6
+6'
The context is represented by 5 lines before and none after.
If the number of non-changed lines before the change is greater than 5 (must be controlled by AbstractChunkOutputBuilder::getCommonChunks(int $lineThreshold)
), then they are entirely dropped:
--- Original
+++ New
@@ @@
-7
+7'
At the same time, the non-changed lines after the change are dropped independently on the size of the chunk.
Is this behavior by design? If yes, what's the exact meaning of $lineThreshold
.
@SpacePossum What do you think?
Hi,
This is by design (I think) as between 1.4 and 2.0 I've made only minor tweaks to the output itself.
The meaning of $lineThreshold
is to state how many of the same lines may be repeated before a new hunk must be created.
However this no option to state how many context lines should be added before and after each hunk.
The are of few other details about the output (1.4 ~ 2.0) that make it non-compat with patch
and/or git apply
and differs from the udiff GNU as generated for example by diff
.
If you want to generate diff's that are patch
, git apply
compat and follows udiff
I would suggest using the 3rd party output builder for diff ^2.0
;
https://github.com/GeckoPackages/GeckoDiffOutputBuilder
disclaimer, I wrote that output builder so I'm biased ;)
@SpacePossum I only brought up GNU diff to illustrate how context lines can be displayed in a human-friendly way. In my particular case, it's not required that the output is compatible with patch
, but it's important that it's well understood by a human (I believe the same is true for PHPUnit).
Prior to 2.0
, comparing the same files as in the description would yield the following result:
--- Original
+++ New
@@ @@
1
2
3
4
5
-6
+6'
7
8
and
--- Original
+++ New
@@ @@
6
-7
+7'
8
In that case, at least a few common leading and trailing common lines were always displayed. Wouldn't it be useful if worked like this (or even better like in GNU diff) in 2.0
too? Or it's irrelevant in the case of PHPUnit?
The GeckoDiffOutputBuilder
looks awesome but using it in infection/infection would require some workaround because GeckoDiffOutputBuilder
requires sebastian/diff:^2.0 while infection/infection supports PHPUnit ^6.0 and thus has to support sebastian/diff:^1.4.
I see the issue, however I see no easy solution for resolving it with a diff 1.4
constraint.
The main reason for making diff 2.0
was (together with adding line numbers by default which ^1.0
lacks) to provide a way of tweaking the output of the differ
. This would help your case as well.
I'm not familiar with infection/infection
TBH so it is hard for me to advice here. If you are stuck, for what ever reason to diff ^1.0
there are the following paths to resolve the issue (but I would still advice strongly to update to diff 2.0
as soon as you can):
- open an PR towards 1.4 to (re)add context lines. It wouldn't be very hard to do, however I feel 1.4 is considered closed for new features (basically what happened between 1.3 and 1.4 is that 1.3 added a lot of context lines without control over it, while 1.4 adds limited context lines but also without control over it)
- use
diff 1.3
until you can usediff 2.0
, which seems plausible on first sight as your package requiresdiff
and asdev
requirementphpunit
, but you would be stuck with a lot trailingsame
lines on a diff
All maybe not very satisfying, but other means like I did for a package like PHP-CS-Fixer are doable, yet require a bunch or work and still be fragile in the end.
@SpacePossum there's no need to make any updates to diff 1.4
, it already displays the context lines. My only question is, can diff 2.0
be fixed in a way that it also displays them?
It could, the easiest way to do it is make diff 2.x
requiring the GeckoDiffOutputBuilder
, than you would get the builder you need. Question is if @sebastianbergmann is open for that suggestion?
@SpacePossum Do I understand you correctly that you're asking whether I would include https://github.com/GeckoPackages/GeckoDiffOutputBuilder/blob/master/src/DiffOutputBuilder/UnifiedDiffOutputBuilder.php in this repository?
What is the difference between https://github.com/GeckoPackages/GeckoDiffOutputBuilder/blob/master/src/DiffOutputBuilder/UnifiedDiffOutputBuilder.php and https://github.com/sebastianbergmann/diff/blob/master/src/Output/UnifiedDiffOutputBuilder.php?
My suggestion is to update the composer.json
of this package by adding GeckoDiffOutputBuilder
as a requirement. This would help for @morozov case.
I'm not a big fan of moving all of the code from the GeckoDiffOutputBuilder
directly into this repo. This is mainly because I've more plans with it.
The main difference between the two output builders:
diff 2
add "comments" like ''#Warning: Strings contain different line endings!",Gecko
doesn'tdiff 2
always creates minimaldiffs
,Gecko
can be configured to add context lines before and after a hunkdiff 2
has an option to add line numbers,Gecko
always adds the linenumbersdiff 2
has a "free text" header option (any string is allowed),Gecko
requires valid header linesGecko
is designed to bepatch
andgit apply
compatible,diff 2
is a more open format to suits PHPUnit betterGecko
has two other options,commonLineThreshold
andcollapseRanges
Thank you, @SpacePossum. Correct me if I'm wrong, but would @morozov's case not also be solved by him updating his composer.json
?
Yes, well, the issue is he has diff 1.4 || 2.0
as requirement, making it not possible to optionally require packages that require diff 2.0
but don't support diff 1.4
. This is a common limitation of composer (and most package managers).
diff 2
always creates minimaldiffs
,Gecko
can be configured to add context lines before and after a hunk
Does it mean that there is a bug in diff 2
since it preserves up to 5 leading common lines? It's not the minimal diff.
To make sure we're on the same page, I'm only trying to understand what's the expected output of diff 2
and would it make sense for PHPUnit (in the first place) to consistently (like GNU diff and Gecko
do) display common lines around the changed ones. Right now, these lines are displayed inconsistently. Not sure if this should be solved by introducing a dependency.
Does it mean that there is a bug in diff 2 since it preserves up to 5 leading common lines? It's not the minimal diff.
you give the answer here:
I'm only trying to understand what's the expected output of diff 2 and would it make sense for PHPUnit (in the first place) to consistently (like GNU diff and Gecko do) display common lines around the changed ones. Right now, these lines are displayed inconsistently.
:)
The output by diff
(by default) is not defined anywhere AFAIK. It has some history about it of course. The question you've was raised here: #64 when building the current output. diff 1.4
has the same inconsistent output btw.
Not sure if this should be solved by introducing a dependency.
With the dependency you can configure the number of common lines, so I think it will help for you?
Then according to the announcement in #64:
the output is now minimal around the chunks
the current behavior of displaying up to 5 leading common lines is a bug in diff 2.0
(diff 1.4
was not expected to produce the minimal output). And the fact that the other common lines are not displayed (unlike 1.4
) is by design.
Do you think this bug should be fixed in diff 2.0
?
With the dependency you can configure the number of common lines, so I think it will help for you?
Not sure if it's going to happen given the benefit/complexity ratio. Another reason against using particularly Gecko
is that infection
may need the line numbers to be not displayed. The reason is that infection
compares the file contents reconstructed from the PHP AST, so depending on the formatting of the source file, the line numbers may be incorrect and misleading. In this case, the context lines would much more helpful.
Key is that diff
never promised a formalized output, so it is near impossible what to consider unexpected.
TBH I'm kind of a bit lost about what you want to like to see changed and in which package.
I want a consistent output of the common lines (e.g. 3 before and after the change) in diff 2
.
@morozov , that's the thing repeated in each answering comment here. Internal diff (both v1 and v2) doesn't have consistent nor standardized output. If you need it, you need to align on some specification. There is no such one for internal diff used by PHPUnit. Doing it is way more than "eg adding lines".
Proposing specification like "unified diff but no line numbers" is weird, big 👎 for custom variations of serious formats.
I must admit that I don't fully get your use case. Eg, in your scenario, input file could be 100kb one-liner of php (extra ugly, yes), and reconstructed content of code generated from PHP AST will have few hundreds probably, and PHP AST doesn't care about whitespaces at all.
I would rather get number of modified line not post-factum, basing on diff output between real code and regenerated one, but before mutating - during applying the mutation, you could figure out line number still in original code, without any differ->diff(orig, new)
execution yet
that's the thing repeated in each answering comment here
@keradus what do you mean?
I must admit that I don't fully get your use case. […] I would rather get number of modified line […] before mutating
This would help reliable determine the line in which mutation happened but how then can we display the mutation? The diff is mostly a visualization tool which tells, for example: "I replaced greater than with less than in this condition":
--- Original
+++ New
@@ @@
- if ($x > 0) {
+ if ($x < 0) {
Internal diff (both v1 and v2) doesn't have consistent nor standardized output. If you need it, you need to align on some specification. There is no such one for internal diff used by PHPUnit. Doing it is way more than "eg adding lines".
Please do not quote me on what I didn't say. I didn't say "eg adding lines" and I didn't ask for any changes. I'm saying that the output of common lines should be consistent and predictable (right now it's not). A reasonable default could be 3 lines around the change (same as in GNU diff).
Proposing specification like "unified diff but no line numbers" is weird, big 👎 for custom variations of serious formats.
Please do not quote me on what I didn't say. Right now sebastian/diff
doesn't show line numbers by default but it's configurable. It's perfectly fine. I'm only asking if it makes sense to display context lines in a consistent manner. Right now it's inconsistent.
If maintainers think this conversation doesn't contribute to the value of the project, please feel free to close it.
I think it boils down to what the default for diff 2.0
should be, as it is already possible to make any other output by passing a custom builder.
What the default should be is up to the @sebastianbergmann / PHPUnit community,
I've no strong preference on it as I use diff
only with custom (strict udiff) output builders.
With #75 merged, I think this can be closed.