iterative/dvc

`dvc data status` key error

mattangus opened this issue · 1 comments

Bug Report

Running dvc data status gives me an error on one machine but not on another:

Error
dvc data status -v                                                                                                                                                                      10:23:23
2024-04-24 10:23:26,182 DEBUG: v3.50.0 (pip), CPython 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35
2024-04-24 10:23:26,182 DEBUG: command: /home/matt/workspace/virtual_environments/py-3.12/bin/dvc data status -v
2024-04-24 10:23:26,494 ERROR: unexpected error - b'2c6373811567f2b2023f065fb5a333fdeefd54bb'                                                                                                              
Traceback (most recent call last):
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/commands/data.py", line 110, in run
    status = self.repo.data_status(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/repo/data.py", line 234, in status
    git_info = _git_info(repo.scm, untracked_files=untracked_files)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dvc/repo/data.py", line 141, in _git_info
    staged, unstaged, untracked = scm.status(untracked_files=untracked_files)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 880, in status
    staged, unstaged, untracked = git_status(
                                  ^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/porcelain.py", line 1318, in status
    tracked_changes = get_tree_changes(r)
                      ^^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/porcelain.py", line 1456, in get_tree_changes
    for change in index.changes_from_tree(r.object_store, tree_id):
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/index.py", line 553, in changes_from_tree
    yield from changes_from_tree(
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/index.py", line 657, in changes_from_tree
    for name, mode, sha in iter_tree_contents(object_store, tree):
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 1745, in iter_tree_contents
    tree = store[entry.sha]
           ~~~~~^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 154, in __getitem__
    type_num, uncomp = self.get_raw(sha1)
                       ^^^^^^^^^^^^^^^^^^
  File "/home/matt/workspace/virtual_environments/py-3.12/lib/python3.12/site-packages/dulwich/object_store.py", line 601, in get_raw
    raise KeyError(hexsha)
KeyError: b'2c6373811567f2b2023f065fb5a333fdeefd54bb'

2024-04-24 10:23:26,517 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp'
2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp'
2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/.pQ5HNP36nqZbOfuABjeJDA.tmp'
2024-04-24 10:23:26,517 DEBUG: Removing '/home/matt/workspace/HA/OD-Stuff/ha_gym/.dvc/.cache/files/md5/.cOGbTVUVskCrA6vs0mD64Q.tmp'
2024-04-24 10:23:26,525 DEBUG: Version info for developers:
DVC version: 3.50.0 (pip)
-------------------------
Platform: Python 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 3.15.1
	dvc_objects = 5.0.0
	dvc_render = 1.0.1
	dvc_task = 0.3.0
	scmrepo = 3.1.0
Supports:
	http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
Config:
	Global: /home/matt/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p3
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme1n1p3
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b11a8fb5114eb46d6400fbaefadf5890

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-04-24 10:23:26,527 DEBUG: Analytics is enabled.
2024-04-24 10:23:26,550 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp3i_1azl0', '-v']
2024-04-24 10:23:26,557 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp3i_1azl0', '-v'] with pid 141995

Description

This seems to be related to the untracked changes I have in my working directory. However, the other machine that this command works on also has many untracked changes too. dvc status still works.

Reproduce

I'm not sure how to reproduce this issue.

Expected

On the other machine the same command outputs No changes..

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.50.0 (pip)
-------------------------
Platform: Python 3.12.2 on Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 3.15.1
	dvc_objects = 5.0.0
	dvc_render = 1.0.1
	dvc_task = 0.3.0
	scmrepo = 3.1.0
Supports:
	http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
Config:
	Global: /home/matt/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1p3
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme1n1p3
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b11a8fb5114eb46d6400fbaefadf5890

Additional Information (if any):

Thanks for the report. Unfortunately, since it's not reproducible, and the error comes not from dvc but from dulwich, I am going to close this one since it does not look like there's much we can do.