pipeline fails trying to connect to auth.docker.io
GodloveD opened this issue · 8 comments
Description of bug
I've installed caper and the pipeline in a conda environment and I'm trying to run it with example config/data from the repo thusly:
$ git clone https://github.com/encode-dcc/chip-seq-pipeline2
$ cd chip-seq-pipeline2
$ git checkout v2.0.1
$ cd ..
$ rm -rf ~/.caper/
$ caper init local
$ caper run ./chip-seq-pipeline2/chip.wdl -i ./chip-seq-pipeline2/dev/example_input_json/caper/ENCSR000DYI_subsampled_chr19_only_caper.json --conda
$ less /lscratch/29930830/cromwell.out
The pipeline does stuff and produces informational messages for about 15 minutes and then produces an error:
2022-01-10 13:57:25,383|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=-7
2022-01-10 13:57:25,384|caper.cli|ERROR| Check stdout in /lscratch/123456/cromwell.out
Inspecting cromwell.out
turns up the following error (which appears to be the relevant problem?)
2022-01-10 13:25:17,159 INFO - Request threw an exception on attempt #1. Retrying after 883 milliseconds
org.http4s.client.ConnectionFailure: Error connecting to https://auth.docker.io using address auth.docker.io:443 (unresolved: true)
along with a java stack trace, yadda yadda.
Why is this trying to contact docker.io when I'm using the --conda
directive? Is there a workaround? Thanks!
OS/Platform
- OS/Platform: CentOS-7 (NIH HPC Biowulf cluster)
- Conda version: 4.10.3
- Pipeline version: 2.0.1
- Caper version: 2.1.2
Caper configuration file
backend=local
# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs/lsf) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
# file: use md5sum hash (slow).
# path: use path.
# path+modtime: use path and modification time.
local-hash-strat=path+modtime
# Metadata DB for call-caching (reusing previous outputs):
# Cromwell supports restarting workflows based on a metadata DB
# DB is in-memory by default
#db=in-memory
# If you use 'caper server' then you can use one unified '--file-db'
# for all submitted workflows. In such case, uncomment the following two lines
# and defined file-db as an absolute path to store metadata of all workflows
#db=file
#file-db=
# If you use 'caper run' and want to use call-caching:
# Make sure to define different 'caper run ... --db file --file-db DB_PATH'
# for each pipeline run.
# But if you want to restart then define the same '--db file --file-db DB_PATH'
# then Caper will collect/re-use previous outputs without running the same task again
# Previous outputs will be simply hard/soft-linked.
# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=
cromwell=/home/user/.caper/cromwell_jar/cromwell-65.jar
womtool=/home/user/.caper/womtool_jar/womtool-65.jar```
## **Input JSON file**
```json
{
"chip.pipeline_type" : "tf",
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_caper.tsv",
"chip.fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
],
"chip.fastqs_rep2_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep2.subsampled.20.fastq.gz"
],
"chip.ctl_fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/ctl1.subsampled.25.fastq.gz"
],
"chip.ctl_fastqs_rep2_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/ctl2.subsampled.25.fastq.gz"
],
"chip.paired_end" : false,
"chip.title" : "ENCSR000DYI (subsampled 1/25, chr19_chrM only)",
"chip.description" : "CEBPB ChIP-seq on human A549 produced by the Snyder lab"
}
Troubleshooting result
If you ran caper run
without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.
If you ran caper submit
with a running Caper server then first find your workflow ID (1st column) with caper list
and run caper debug [WORKFLOW_ID]
.
Paste troubleshooting result.
2022-01-10 13:57:25,383|caper.nb_subproc_thread|ERROR| Cromwell failed. returncode=-7
2022-01-10 13:57:25,384|caper.cli|ERROR| Check stdout in /lscratch/123456/cromwell.out.1
Does your system have internet connection?
Please try this on your login node
$ docker pull encodedcc/chip-seq-pipeline:v2.1.3
Thank you for your reply.
The docker command is not going to work on the login node since we don't have docker installed. The singularity equivalent will work on both the login nodes and compute nodes. The compute nodes are behind a firewall and some types of traffic (like https) is allowed through a proxy. For java to respect the proxy, we need to set the following:
export JAVA_TOOL_OPTIONS="-Djava.net.useSystemProxies=true"
But in this case the env var does not seem to help.
I'm trying to run this with a locally installed conda environment and I've set the --conda
option so I don't know why the pipeline is trying to contact dockerhub anyway. Thank you.
Okay, then let's localize all data on the login node and then run with a localized file.
Localize the input JSON.
$ cd YOUR_WORK_DIR
# recursively localize the input JSON (pip install autouri if it doesn't exist)
$ autouri loc https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-
pipeline/ENCSR000DYI_subsampled_chr19_only.json . --recursive
$ find -name *.json
./15b4ff439de58d859f4c3ee4482c3bd2/ENCSR000DYI_subsampled_chr19_only.local.json
Use that .local.json
input JSON file to run a pipeline.
The docker error looks very weird. Please let me know if that occurs again.
Thanks again for your response. I've localized that data as you suggested and tried running it again. I've got a lot of errors in the standard out and the cromwell.out file also seems to be trying to access docker. I've attached a copy and past of the stdout, and the cromwell.out file.
Are you sure that your caper is 2.1.2 (latest)?
Please check caper versions both inside and outside of the Conda (base or pipeline's env) environment.
$ caper -v
2.1.2
Also, is this scratch directory /lscratch/29930830
dynamically allocated (when a job is submitted)?
If so, please don't run pipelines on such dynamic scratch directories.
Caper/Cromwell will lose track of important intermediate files.
Found this in your cromwell.out
.
2022-01-14 10:01:15,867 INFO - Request method=GET uri=https://auth.docker.io/token?service=registry.docker.io&scope=repository%3Aencodedcc/chip-seq-pipeline%3Apull headers= threw an exception on attempt #4. Giving up.
org.http4s.client.ConnectionFailure: Error connecting to https://auth.docker.io using address auth.docker.io:443 (unresolved: true)
So Caper is still trying to download the chip-seq-pipeline's docker image from dockerhub.
It's weird. I still don't know why this happens.
Thank you for you response and sorry for the delay. I can confirm I'm running caper version 2.1.2. I reran the pipeline from a network storage location as you suggested and received the same docker error. I don't know if it makes a difference, but my caper config is local, so I don't think using local vs network storage should make a difference. It seems to me that the workflow is trying to access docker and dockerhub even though it is being instructed to run through conda. Maybe this error was not caught because testing is carried out on a system that has access to docker and docker hub? Unsure.
Got the same problem when running atac-seq-pipeline locally. @GodloveD Have you solved this problem?
2024-03-16 12:04:24,823 cromwell-system-akka.dispatchers.engine-dispatcher-132 INFO - Not triggering log of restart checking token queue status. Effective log interval = None
2024-03-16 12:04:24,862 cromwell-system-akka.dispatchers.engine-dispatcher-117 INFO - Not triggering log of execution token queue status. Effective log interval = None
2024-03-16 12:04:27,256 cromwell-system-akka.dispatchers.engine-dispatcher-117 INFO - WorkflowExecutionActor-f20624b6-5892-494f-af65-616cd41fd1ff [UUID(f20624b6)]: Starting atac.r
ead_genome_tsv
2024-03-16 12:04:27,877 cromwell-system-akka.dispatchers.engine-dispatcher-117 INFO - Assigned new job execution tokens to the following groups: f20624b6: 1
2024-03-16 12:04:48,936 INFO - Request threw an exception on attempt #1. Retrying after 649 milliseconds
org.http4s.client.ConnectionFailure: Error connecting to https://auth.docker.io using address auth.docker.io:443 (unresolved: true)
I had a similar issue except on the test config for the RNA-seq pipeline. I ended up using gsutil
to download files locally (which also indicated that files are public and there shouldn't be an authentication issue) and changing the configuration with local paths. Probably an issue with caper itself interacting with some systems maybe? I was using caper version 2.3.2 by the way.