Our dulwich code can leave undesired files in the repo directory
Closed this issue · 1 comments
nickjalbert commented
The problem is that we initially clone into the default branch (e.g. master or main usually) when we clone a directory, but then we may checkout another branch (via a porcelain.reset()
). However this checkout/reset is buggy and/or unintuitive in that it doesn't clean up files that exist in master but do not exist in the branch you checked out.
See pcs/repo.py
for the part of the code where we use porcelain.
This bug was mentioned in #342.
Here is a script to reproduce the problem:
from pathlib import Path
from dulwich import porcelain
from dulwich.objectspec import parse_commit
local_repo_path = Path("dulwich_bug")
assert not local_repo_path.exists(), "Delete ./dulwich_bug dir"
local_repo_path.mkdir(parents=True)
github_url = "https://github.com/agentos-project/agentos.git"
porcelain.clone(source=github_url, target=str(local_repo_path), checkout=True)
repo = porcelain.open_repo(local_repo_path)
to_checkout = "07bc71358b4360092b58d78f9eee6dc939e90b10"
treeish = parse_commit(repo, to_checkout).sha().hexdigest()
porcelain.reset(repo=repo, mode="hard", treeish=treeish)
# The hash we checkout does not have a pcs/ folder so no files should exist.
# Confirm here: https://github.com/agentos-project/agentos/tree/07bc71358b4360092b58d78f9eee6dc939e90b10
# The problem is that we initially clone into master (which does have a pcs/
# folder), then we reset to a commit that doesn't have the pcs/ folder
# BUT dulwich doesn't do what you'd expect and clean up the non-existent dir
test_path = local_repo_path / "pcs" / "component.py"
assert not test_path.exists(), f"The path {test_path} should not exist!"
nickjalbert commented
See also dulwich issue "porcelaine.reset is not deleting files locally #840"