Error downloading sars-cov-2 dataset after release 2024-02-16--04-00-32Z
Closed this issue · 3 comments
Hi Nextclade team! I'm encountering some issues after the release of tag 2024-02-16--04-00-32Z
. I see errors with both the v3.0.0
and v3.2.0
nextclade CLI versions.
-
The explicit tag
2024-02-16--04-00-32Z
downloads successfully, as does omitting the tag. But the output version is "unreleased":nextclade dataset get --name sars-cov-2 --tag "2024-02-16--04-00-32Z" --output-dir "sars-cov-2_2024-02-16--04-00-32Z" nextclade dataset get --name sars-cov-2 --output-dir "sars-cov-2_no-tag"
"version": { "tag": "unreleased" },
-
Any other tag raises an error.
nextclade dataset get --name sars-cov-2 --tag "2024-01-16--20-31-02Z" --output-dir "sars-cov-2_2024-01-16--20-31-02Z" nextclade dataset get --name sars-cov-2 --tag "latest" --output-dir "sars-cov-2_latest"
Error: 0: Dataset not found: 'sars-cov-2'. Did you mean: - nextstrain/sars-cov-2/XBB - nextstrain/sars-cov-2/BA.2 - nextstrain/sars-cov-2/BA.2.86 - nextstrain/sars-cov-2/wuhan-hu-1/orfs - nextstrain/sars-cov-2/wuhan-hu-1/proteins - community/isuvdl/mazeller/prrsv2/orf5/yimim2023 - nextstrain/mpox/all-clades - nextstrain/rsv/a/EPI_ISL_412866 - nextstrain/rsv/b/EPI_ISL_1653999 ? Type `nextclade dataset list` to show available datasets. Location: packages/nextclade-cli/src/cli/nextclade_dataset_get.rs:79 Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it. Run with RUST_BACKTRACE=full to include source snippets.
Hi Katherine @ktmeaton
Thanks for reporting! You've hit 2 bugs simultaneously - one in software and another in data! That's a bingo!
The problem with "tag": "unreleased"
was caused by incorrect files being put into the zip archives which Nextclade CLI relies on. I fixed the dataset build logic for the future dataset releases in #177, and in #178 I also retroactively re-uploaded to our servers the corrected dataset zips for already released datasets, so that the tags should now show up in the pathogen.json
correctly once you re-download the datasets. There is no new dataset release, the tags are the same and the files inside datasets are the same, except for the tag
in pathogen.json
.
The problem with "Dataset not found" was caused by the bug in Nextclade software itself. I fixed it in nextstrain/nextclade#1420 and will release a new version shortly. You will need to update Nextclade CLI to get the fix.
In the meantime you can manually download dataset.zip
files from one of the subdirectories here: https://github.com/nextstrain/nextclade_data/tree/master/data_output/nextstrain/sars-cov-2/wuhan-hu-1/orfs. These are the exact same zips which nextclade dataset get
downloads (and extracts) for you.
Nextclade CLI 3.2.1 is now released with the mentioned bugfix.
Here is how I tested:
$ docker run -it --rm nextstrain/nextclade:3.2.1 bash -c "nextclade dataset get --name=sars-cov-2 --tag='2024-01-16--20-31-02Z' --output-dir=out && grep 'tag' 'out/pathogen.json'"
"tag": "2024-01-16--20-31-02Z"
$ docker run -it --rm nextstrain/nextclade:3.2.1 bash -c "nextclade dataset get --name=sars-cov-2 --tag='2024-02-16--04-00-32Z' --output-dir=out && grep 'tag' 'out/pathogen.json'"
"tag": "2024-02-16--04-00-32Z"
$ docker run -it --rm nextstrain/nextclade:3.2.1 bash -c "nextclade dataset get --name=sars-cov-2 --tag='latest' --output-dir=out && grep 'tag' 'out/pathogen.json'"
"tag": "2024-02-16--04-00-32Z",
$ docker run -it --rm nextstrain/nextclade:3.2.1 bash -c "nextclade dataset get --name=sars-cov-2 --output-dir=out && grep 'tag' 'out/pathogen.json'"
"tag": "2024-02-16--04-00-32Z"
I will close the issue. Please comment or open a new issue if there's still problems.
Thank you, it works great now with no errors!