torbiak/git-autofixup

Limiting to certain files and hunks

tarsius opened this issue · 13 comments

On magit/magit#3053 @torbiak said:

[this made] me think about how useful it'd be to limit git-autofixup to certain files or regions of files, although I didn't come to any conclusions.

For files it should be easy to do: git autofixup revision -- file1 file2 .... But if you do that, then you probably also want to do it for individual hunks or "regions".


For Magit to be able to use git-autofixup only on certain hunks it would be enough if you implemented some sort of apply subcommand.

In Magit a user can stage an individual hunk by press s while the cursor is inside that hunk, which causes something like git diff ... | ... limit to hunk in question ... | git apply ... to happen. If git-autofixup also allowed piping a potentially manipulated diff to it, that would be great.

Thanks for considering.

Or another idea; currently it is an error to try to autofixup while there are staged changes. Why not limiting the operation to those staged changes instead?

Yeah, accepting a diff on stdin seems good. Seems like limiting it to staged changes would make it difficult to create commits.

Seems like limiting it to staged changes would make it difficult to create commits.

I had something like this in mind:

  1. Commit the staged changes.
  2. Create a stash with the unstaged changes.
  3. Soft reset to HEAD~.
  4. Assign the changes to commits as usual.
  5. Iff everything got assigned, then attempt to pop the stash.

Adding the option to accept diffs to git-autofixup seems easy, but modifying diffs correctly doesn't seem easy or convenient compared to adding changes to the index. So, I'm reconsidering using the index, but I'm wary of doing multistep state changes with data that matters.

@odnoletkov's fork has an interesting approach that uses a temporary index to target only staged changes. Looks like there's a lot less that can go wrong. I think it'll work like this:

  • create a temporary index seeded with the tree from HEAD
  • use the temporary index to create commits via GIT_INDEX_FILE
  • after creating the fixup commits, the user's index is the same as they left it, except minus any hunks that got autofixupped

It seems to work, but I haven't tested it very well yet.

This approach works as long as the hunk/diff you want to fixup is based off HEAD. Current version maintains this invariant by refusing to work with non-empty index. My fork instead considers only staged hunks and makes sure fixups are applied on top of HEAD by using temporary index.

How do we keep this invariant for arbitrary hunk? I don't use magit, but I use vim-fugitive which I assume is similar. In there I can work with both head-to-staged hunks and staged-to-worktree hunks – depending on the context. So we would have to restrict such autofixup command – it would work in some cases (head-to-staged or staged-to-worktree if index is empty) but in other cases should output an error.

How do we keep this invariant for arbitrary hunk?

Let's forget about that. I don't actually need that and it needlessly complicates matters. "Assign the staged changes to commit without messing up the unstaged changes" is all I want. If that can be accomplished using a temporary second index, then great, go for that approach!

Do you need anything from me to move this forward? If you don't have the time/motivation to work on this at this time that's perfectly fine with me -- just want to make sure you are not waiting for me.

@odnoletkov Thanks for the thoughts. I agree with @tarsius that we don't need to worry about arbitrary hunks for now, since we only support hunks that are in the worktree or index. I'm having trouble thinking of a situation where someone would want to autofixup hunks that don't apply to HEAD.

@tarsius: Thanks. I think I have this feature working and adequately tested, but during that work I noticed some unrelated undesirable behaviour that I haven't got to the bottom of yet. After I figure it out I'll be making a new release. I'm planning to have git-autofixup always use a temp index and choose whether to autofixup the hunks in the index or worktree based on whether the index is dirty. Would that fit with how magit uses it?

Would that fit with how magit uses it?

The internal implementation details shouldn't really matter for Magit I think.

As far as the interface goes, sticking to what we have works perfectly find. Magit has no need to be able to pipe a diff for example.

@torbiak I think this creates some unintended noise in the index when running with only unstaged changes:
When some hunk was staged, I didn't encounter a problem.

rm -rf tmp_repo
git init tmp_repo
cd tmp_repo

git commit --allow-empty -m initial\ commit

echo 1 > a
git add a
git commit -m A

echo 2 > a
../git-autofixup HEAD~

git status

This creates a fixup commit just fine, but the status output shows that index and working tree are both marked dirty.

Initialized empty Git repository in /home/johannes/git/git-autofixup/tmp_repo/.git/
/home/johannes/git/git-autofixup/tmp_repo
[master (root-commit) a4da8da] initial commit
[master 2229221] A
 1 file changed, 1 insertion(+)
 create mode 100644 a
[master 7102698] fixup! A
 1 file changed, 1 insertion(+), 1 deletion(-)
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   a

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   a

Using either git add a or git restore --staged a resolves this, because the changes cancel each other out.

I think we can fix this by not using a temporary index when no hunks are staged.
Incidentally, that also fixes the performance issue I was having with the temporary index:
It took git-autofixup 4 seconds to create a single fixup commit from one hunk (but with 90 potential target commits).
After this change it's back to 0.5 seconds.

diff --git a/git-autofixup b/git-autofixup
index f8f6167..430b0eb 100755
--- a/git-autofixup
+++ b/git-autofixup
@@ -565,5 +565,9 @@ sub main {
     # Limit the tempfile's lifetime to the execution of main().
     local $ENV{GIT_INDEX_FILE};  # Throw away ENV changes between main() calls.
-    my $tempfile = create_temp_index();
+    # TODO I think this is the same as is_index_dirty().
+    my $is_index_dirty = system(git_cmd(qw(diff-index --cached HEAD --quiet))) != 0;
+    if ($is_index_dirty) {
+	    my $tempfile = create_temp_index();
+    }
 
     for my $sha (@ordered_shas) {

BTW before I discovered git-autofixup I was actually writing a very similar Perl script with the same name.
My heuristic was much worse, but an interesting idea I implemented was to first spawn all git-blame processes asynchronously, and then start reading their output one by one. I'll try if this makes things faster.

This creates a fixup commit just fine, but the status output shows that index and working tree are both marked dirty.

ah, it's embarrassing that I didn't realize the original index will be out of sync when using a temp index to create fixup commits for unstaged hunks.

I think we can fix this by not using a temporary index when no hunks are staged.
Incidentally, that also fixes the performance issue I was having with the temporary index:
It took git-autofixup 4 seconds to create a single fixup commit from one hunk (but with 90 potential target commits).
After this change it's back to 0.5 seconds.

I haven't been testing git-autofixup on large repos and didn't consider that reading the worktree into the temp index could be expensive. I'm only using the temp index all the time to avoid having multiple code paths to test, but there's other factors that are different anyway, so I'll work in your above changes.

BTW before I discovered git-autofixup I was actually writing a very similar Perl script with the same name.

ha, cool.

My heuristic was much worse, but an interesting idea I implemented was to first spawn all git-blame processes asynchronously, and then start reading their output one by one. I'll try if this makes things faster.

I expect that git blame would be the longest step most of the time on a big repo. Parallelizing seems like an option, but considering how many hunks there could be we probably wouldn't want a process for each hunk running at the same time. I'm not that familiar with concurrency in perl, but I think partitioning the hunks, forking a few times, and communicating back with the parent via pipes and select() or with temp files and wait() would be ok. Also, we might be able to drastically reduce the work git blame needs to do by using the -S or --reverse options.

Forgot to say I really like this feature to only autofixup staged hunks, it works just as expected.

Yeah it makes sense to partition the blame processes. I'll try -S and --reverse first.

Maybe it also makes sense to avoid the temporary index.. but unstaging everything and then selectively committing hunks as fixups could be complex.

Thanks a lot for implementing this! Works well so far.

I have adjusted Magit accordingly.