agentos-project/agentos

Our dulwich code can leave undesired files in the repo directory

Closed this issue · 1 comments

The problem is that we initially clone into the default branch (e.g. master or main usually) when we clone a directory, but then we may checkout another branch (via a porcelain.reset()). However this checkout/reset is buggy and/or unintuitive in that it doesn't clean up files that exist in master but do not exist in the branch you checked out.

See pcs/repo.py for the part of the code where we use porcelain.

This bug was mentioned in #342.

Here is a script to reproduce the problem:

from pathlib import Path

from dulwich import porcelain
from dulwich.objectspec import parse_commit

local_repo_path = Path("dulwich_bug")
assert not local_repo_path.exists(), "Delete ./dulwich_bug dir"
local_repo_path.mkdir(parents=True)
github_url = "https://github.com/agentos-project/agentos.git"
porcelain.clone(source=github_url, target=str(local_repo_path), checkout=True)
repo = porcelain.open_repo(local_repo_path)
to_checkout = "07bc71358b4360092b58d78f9eee6dc939e90b10"
treeish = parse_commit(repo, to_checkout).sha().hexdigest()
porcelain.reset(repo=repo, mode="hard", treeish=treeish)

# The hash we checkout does not have a pcs/ folder so no files should exist.
# Confirm here: https://github.com/agentos-project/agentos/tree/07bc71358b4360092b58d78f9eee6dc939e90b10
# The problem is that we initially clone into master (which does have a pcs/
# folder), then we reset to a commit that doesn't have the pcs/ folder
# BUT dulwich doesn't do what you'd expect and clean up the non-existent dir
test_path = local_repo_path / "pcs" / "component.py"
assert not test_path.exists(), f"The path {test_path} should not exist!"