COMBINE-lab/simpleaf

permit list download failure

Closed this issue · 8 comments

I'm running into an error regarding the simpleaf quant automatic barcode permit list downloading. This is currently not reproducible outside of my workflow but I wanted to document the error and how I got around it in case someone else runs into this issue in the future.

Error: failed to download permit list ExitStatus(unix_wait_status(256))

This is related to the --chemistry '10xv3' and --unfiltered-pl flags, where if they are set and no file is passed to --unfiltered-pl it attempts to download the file from https://umd.box.com/shared/static/vc9zd4qyjj581gvtolw5kj638wmg4f3s using a wget command constructed through rust.

pub fn get_permit_if_absent(af_home: &Path, chem: &Chemistry) -> Result<PermitListResult> {
let chem_file;
let dl_url;
match chem {
Chemistry::TenxV2 => {
chem_file = "10x_v2_permit.txt";
dl_url = "https://umd.box.com/shared/static/jbs2wszgbj7k4ic2hass9ts6nhqkwq1p";
}
Chemistry::TenxV3 => {
chem_file = "10x_v3_permit.txt";
dl_url = "https://umd.box.com/shared/static/vc9zd4qyjj581gvtolw5kj638wmg4f3s";
}
_ => {
return Ok(PermitListResult::UnregisteredChemistry);
}
}
let odir = af_home.join("plist");
if odir.join(chem_file).exists() {
Ok(PermitListResult::AlreadyPresent(odir.join(chem_file)))
} else {
run_fun!(mkdir -p $odir)?;
let mut dl_cmd = std::process::Command::new("wget");
dl_cmd
.arg("-v")
.arg("-O")
.arg(odir.join(chem_file).to_string_lossy().to_string())
.arg("-L")
.arg(dl_url);
let r = dl_cmd.output()?;
if !r.status.success() {
return Err(anyhow!("failed to download permit list {:?}", r.status));
}
Ok(PermitListResult::DownloadSuccessful(odir.join(chem_file)))
}
}

Manually downloading this via wget https://umd.box.com/shared/static/vc9zd4qyjj581gvtolw5kj638wmg4f3s does work, and passing this file manually to --unfiltered-pl results in successful quantification.

If I ever figure out how to reproduce I will update the post.

rob-p commented

Hi @rpolicastro,

Very strange that the same command works for you outside of the workflow! Is the wget being used the same? Are the arguments to wget the same? For example the command constructed in simpleaf uses the some flags like:

wget -v -O $ALEVIN_FRY_HOME/plist/10x_v3_permit.txt -L https://umd.box.com/shared/static/vc9zd4qyjj581gvtolw5kj638wmg4f3s

but otherwise, not sure why they'd behave differently!

The above command does work, which is what's perplexing.

The error occurs consistently in my snakemake pipeline using the simpleaf biocontainers docker container but not when I run the commands locally through bash. Since it could theoretically be related to some quirk with snakemake I would need to make a snakemake reprex to really check.

rob-p commented

@rpolicastro,

One thing I'm investigating for 0.16.2 (for which the main point is to add the --seed flag we discussed in the other issue), is using the https library we already include to download the file directly, instead of calling out to wget. That should work, but it needs more testing. Also, this isn't the only place we use wget — we also use it to download the list of workflow templates, so that use would have to be replaced as well. Does your biodocker also have wget? Does the wget work in that biodocker enviornment?

--Rob

Good thought. If you shell into the container it looks like they are using a version of wget that doesn't have the -L or -v arguments.

docker run -it --rm quay.io/biocontainers/simpleaf:0.16.1--h4ac6f70_0 /bin/bash
export ALEVIN_FRY_HOME='.'
wget -v -O $ALEVIN_FRY_HOME/plist/10x_v3_permit.txt -L https://umd.box.com/shared/static/vc9zd4qyjj581gvtolw5kj638wmg4f3s

The error.

wget: invalid option -- 'v'
BusyBox v1.36.1 (2023-10-17 11:19:33 UTC) multi-call binary.

Usage: wget [-cqS] [--spider] [-O FILE] [-o LOGFILE] [--header STR]
        [--post-data STR | --post-file FILE] [-Y on/off]
        [--no-check-certificate] [-P DIR] [-U AGENT] [-T SEC] URL...

Retrieve files via HTTP or FTP

        --spider        Only check URL existence: $? is 0 if exists
        --header STR    Add STR (of form 'header: value') to headers
        --post-data STR Send STR using POST method
        --post-file FILE        Send FILE using POST method
        --no-check-certificate  Don't validate the server's certificate
        -c              Continue retrieval of aborted transfer
        -q              Quiet
        -P DIR          Save to DIR (default .)
        -S              Show server response
        -T SEC          Network read timeout is SEC seconds
        -O FILE         Save to FILE ('-' for stdout)
        -o LOGFILE      Log messages to FILE
        -U STR          Use STR for User-Agent header
        -Y on/off       Use proxy

As a quick fix if you add GNU wget to the dependencies of the conda environment it would probably fix the issue (it's probably a good idea to add all external dependencies like wget to the environment anyway). Using the rust library to circumvent wget would also fix the issue I'm sure requires more effort.

rob-p commented

Good to know; thanks for testing it out! Actually, because rust is so awesome, replacing the 2 uses of wget with the rust https library we are currently using — minreq — was actually kind of trivial. @DongzeHE is testing it now and if that looks good I'll push 0.16.2 with the --seed parameter and the replacement of wget.

rob-p commented

Hi @rpolicastro,

The 0.16.2 release is cut and should be available from bioconda now (or, I'm assuming by the time you read this). If you get a chance, could you let us know if the download using the builtin library works for you in your snakemake environment? The use of external wget has now been removed.

--Rob

Automatic downloading of the barcode list now works with 0.16.2!

rob-p commented

Awesome — I think we can officially close this issue then :).