polyactis/Accucopy

Exception in thread TaskFileWriter-Thread:

zhujack opened this issue · 7 comments

Dear author,

Thanks for developing this software. I re-aligned my bam files with the provided reference genome, then ran Accucopy docker image with Singularity following the instructions, but I always get the following errors:

Exception in thread TaskFileWriter-Thread:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
self._writeIfSet()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
self.writeFunc()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
return f(self, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: 'output/pyflow.data/state/pyflow_tasks_info.txt'
Exception in thread TaskFileWriter-Thread:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
self._writeIfSet()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
self.writeFunc()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
return f(self, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2618, in writeTaskStatus
tmpFp = open(tmpFile, "w")
IOError: [Errno 2] No such file or directory: 'output/pyflow.data/state/pyflow_tasks_runstate.txt.update.incomplete'

I might have missed something. Any suggestions? Thanks.

Jack

Hello Yu,

Thanks for the work put into this tool, I was very happy to come across a somatic CNV tool that works well with low coverage samples.
I have recently started to use Accucopy and while my tests were successful (i.e. complete output files and successful log conclusion), however after implementing the tool into a pipeline. I also received the same error as @zhujack


Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
    self._writeIfSet()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
    self.writeFunc()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
    return f(self, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
    fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: '/gpfs/scratch/blanep01/productionDir/somatic/work/7f/d771b3523042fca16365dbfb4b8894/MMRF_1049_1_BM_CD138pos_T2_KHWGL_L10520_vs_MMRF_1049_1_PB_Whole_C1_KHWGL_L10519_results/pyflow.data/state/pyflow_tasks_info.txt'

I've done my best to trace the bug and the issue seems to be with PyFlow not creating the {OUTPUT_DIR}/pyflow.data/state directory or any of the files needed in this directory i.e. pyflow_tasks_info.txt as referenced in the error message.
There are other errors related to files in this directory not existing.
Furthermore, the error is confusing as the same block of code in the pyflow.py script that is used to create the state directory also creates the {OUTPUT_DIR}/pyflow.data/logs directory which does exist and contains the correct files.

# This is from https://github.com/Illumina/pyflow/blob/master/pyflow/src/pyflow.py
def setupNewRun(self, param) :
        self.param = param

        # setup log file-handle first, then run the rest of parameter validation:
        # (hold this file open so that we can still log if pyflow runs out of filehandles)
        self.param.dataDir = os.path.abspath(self.param.dataDir)
        self.param.dataDir = os.path.join(self.param.dataDir, "pyflow.data")
        logDir = os.path.join(self.param.dataDir, "logs")
        ensureDir(logDir)
        self.flowLogFile = os.path.join(logDir, "pyflow_log.txt")
        self.flowLogFp = open(self.flowLogFile, "a")

        # run remaining validation
        self._validateFixParam(self.param)

        #  initial per-run data
        self.taskErrors = set()  # this set actually contains every task that failed -- tasks contain all of their own error info
        self.isTaskManagerException = False

        # create data directory if it does not exist
        ensureDir(self.param.dataDir)

        # check whether a process already exists:
        self.markFile = os.path.join(self.param.dataDir, "active_pyflow_process.txt")
        if os.path.exists(self.markFile) :
            # Non-conventional logging situation -- another pyflow process is possibly using this same data directory, so we want
            # to log to stderr (even if the user has set isQuiet) and not interfere with the other process's log
            self.flowLogFp = None
            self.param.isQuiet = False
            msg = [ "Can't initialize pyflow run because the data directory appears to be in use by another process.",
                    "\tData directory: '%s'" % (self.param.dataDir),
                    "\tIt is possible that a previous process was abruptly interrupted and did not clean up properly. To determine if this is",
                    "\tthe case, please refer to the file '%s'" % (self.markFile),
                    "\tIf this file refers to a non-running process, delete the file and relaunch pyflow,",
                    "\totherwise, specify a new data directory. At the API-level this can be done with the dataDirRoot option." ]
            self.markFile = None  # this keeps pyflow from deleting this file, as it normally would on exit
            raise DataDirException(msg)
        else :
            mfp = open(self.markFile, "w")
            msg = """
This file provides details of the pyflow instance currently using this data directory.
During normal pyflow run termination (due to job completion, error, SIGINT, etc...),
this file should be deleted. If this file is present it should mean either:
(1) the data directory is still in use by a running workflow
(2) a sudden job failure occurred that prevented normal run termination
The associated pyflow job details are as follows:
"""
            mfp.write(msg + "\n")
            for line in self.getInfoMsg() :
                mfp.write(line + "\n")
            mfp.write("\n")
            mfp.close()

        stateDir = os.path.join(self.param.dataDir, "state")
        ensureDir(stateDir)

Additionally, the issue does not seem to present in Strekla2's output which I found uses an alternative pyflow installation.
Perhaps we could point Accucopy to use this installation of pyflow?

I can expand on the error with more log and output files, just let me which you might need.
I will also follow this issue with an email to you as well.

Best,
Patrick

@pblaney
Hello,
Can you pack and upload the dir {OUTPUT_DIR}/pyflow.data or send it to my email(xinpingfan@gmail.com)?

Which container you have used, Docker or Singularity?

How do you implement Accucopy into a pipeline, wrap it in a Makefile or shell script and submit to SGE?

Best,
Xinping

Hi @fanxinping,

Thank you for getting back to me on this. I have sent you the zipped file of the {OUTPUT_DIR}/pyflow.data.
I am using a Singularity container. It was created by converting the docker image from polyactis/accucopy:latest to a Singularity SIF file.
I have implemented Accucopy into pipeline that executes Accucopy with a shell script and uses SLURM as a job scheduler.

Hope this helps shed light,
Patrick

Hi @pblaney

I have checked the zipped file of the {OUTPUT_DIR}/pyflow.data. The file pyflow.data/logs/pyflow_tasks_stderr_log.txt doesn't contain any error message and shows that Accucopy run successfully. The file pyflow.data/logs/pyflow_tasks_stdout_log.txt contains the complete output of Accucopy, which indicates Accucopy has output the results.

Maybe you have sent the wrong file to me. Can you send the log file which contains the follow message?

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
    self._writeIfSet()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
    self.writeFunc()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
    return f(self, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
    fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: '/gpfs/scratch/blanep01/productionDir/somatic/work/7f/d771b3523042fca16365dbfb4b8894/MMRF_1049_1_BM_CD138pos_T2_KHWGL_L10520_vs_MMRF_1049_1_PB_Whole_C1_KHWGL_L10519_results/pyflow.data/state/pyflow_tasks_info.txt'

@fanxinping,

I just sent you the log file that captured those errors.
As you note, the Accucopy output is intact and nothing appears to be wrong with its execution. However, this error causes the job to reflect as failed and thus does not propagate the output to the next process.
But as you can see in that zipped directory that there is no state directory which is the cause of the errors.

@pblaney

It's very strange. I have tested Accucopy in my SLURM env, and it worked well.
The shell script is test.sh:

#!/bin/sh

singularity exec --bind /y accucopy_latest.sif /usr/local/Accucopy/main.py -c configure_hg38 -t tumor.bam -n normal.bam -o result --nCores 15

and use this command to submit:

sbatch --cpus-per-task 15 -o test.output test.sh

I think you should check SLURM env or the subsequent process in the shell script. Maybe some operation delete state directory accidentally.

You can share your shell script and submit command if possible.