pull: "Fetching" step takes forever
zhf231298 opened this issue · 5 comments
pull: "Fetching" takes forever
Description
Since the update to the version 3.45, dvc pull
started to spend a massive amount of time for "Fetching".
Can't tell precisely what is the reason, but at least the computation of the md5 of a large file is done repetitively within different dvc pull
executions, even though it is stated that the computation is done only once.
Reproduce
- dvc pull
Expected
The "Fetching" should last very short, which is the situation that I have from another device where DVC 3.38.1 is being used.
Environment information
Problematic environment:
- OS: macOS Sonoma 14.3
- DVC: 3.45.0 (brew)
- Remote storage: S3 bucket
Properly working environment:
- OS: Ubuntu 22.04.3 LTS
- DVC: 3.38.1 (pip)
- Remote storage: S3 bucket (the same of before)
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.45.0 (brew)
--------------------------
Platform: Python 3.12.2 on macOS-14.3-arm64-arm-64bit
Subprojects:
dvc_data = 3.13.0
dvc_objects = 5.0.0
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 3.1.0
Supports:
azure (adlfs = 2024.2.0, knack = 0.11.0, azure-identity = 1.15.0),
gdrive (pydrive2 = 1.19.0),
gs (gcsfs = 2024.2.0),
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.2.0, boto3 = 1.34.34),
ssh (sshfs = 2023.10.0),
webdav (webdav4 = 0.9.8),
webdavs (webdav4 = 0.9.8),
webhdfs (fsspec = 2024.2.0)
Config:
Global: /Users/zhf231298/Library/Application Support/dvc
System: /opt/homebrew/share/dvc
Could you also share dvc config -l
?
Could you also share
dvc config -l
?
Sure, the output of dvc config -l
is:
remote.s3-bucket.url=s3://bucket-name
remote.s3-bucket.version_aware=true
core.autostage=true
core.remote=s3-bucket
The bucket name here has been substituted by a dummy name.
Confirmed this is slow for version-aware remotes, although it seems like cache remotes are not impacted.
I faced the same issue with version-aware remote. I tried dvc pull
and dvc fetch
I did let dvc 3.49 wait 16 hours in Fetching and there was no progress. Only word Fetching was showing and no file names after that.
After I downgraded to dvc 3.38.1, Fetching only took about 10 seconds to start.
Took a look and seems this is caused by this commit:
iterative/dvc-data@f398036#diff-89f845ba2a0911623cfc247bbdb34218d79bbbacec33b3af2621d64a24d28557
Unfortunately, I don't see a quick fix, and we are moving towards dropping support for version-aware remotes due to lots of small issues and inconsistencies like this one, so I am going to close this and suggest using traditional remotes to avoid these problems.