data from taginfo-fl-r02/NT_* files are collapsed into one line
Closed this issue · 2 comments
I'm trying to get tailseeker installed for one of our users to run on our (CentOS 6) cluster, but he's having trouble getting a run to complete and I'm not sure how it's going wrong. This is an excerpt from the output of the error he's getting:
Finished job 73.
132 of 146 steps (90%) done
rule merge_and_deduplicate_taginfo:
input: scratch/taginfo-fl-r02/NT_a1104.txt.gz, scratch/taginfo-fl-r02/NT_a2104.txt.gz, scratch/taginfo-fl-r02/NT_a2113.txt.gz, scratch/taginfo-fl-r02/NT_a1107.txt.gz, scratch/taginfo-fl-r02/NT_a2105.txt.gz, scratch/taginfo-fl-r02/NT_a2110.txt.gz, scratch/taginfo-fl-r02/NT_a1106.txt.gz, scratch/taginfo-fl-r02/NT_a2111.txt.gz, scratch/taginfo-fl-r02/NT_a2108.txt.gz, scratch/taginfo-fl-r02/NT_a2103.txt.gz, scratch/taginfo-fl-r02/NT_a1101.txt.gz, scratch/taginfo-fl-r02/NT_a1108.txt.gz, scratch/taginfo-fl-r02/NT_a2116.txt.gz, scratch/taginfo-fl-r02/NT_a1116.txt.gz, scratch/taginfo-fl-r02/NT_a2114.txt.gz, scratch/taginfo-fl-r02/NT_a1105.txt.gz, scratch/taginfo-fl-r02/NT_a1111.txt.gz, scratch/taginfo-fl-r02/NT_a2101.txt.gz, scratch/taginfo-fl-r02/NT_a2102.txt.gz, scratch/taginfo-fl-r02/NT_a2109.txt.gz, scratch/taginfo-fl-r02/NT_a1115.txt.gz, scratch/taginfo-fl-r02/NT_a1112.txt.gz, scratch/taginfo-fl-r02/NT_a2115.txt.gz, scratch/taginfo-fl-r02/NT_a2107.txt.gz, scratch/taginfo-fl-r02/NT_a1113.txt.gz, scratch/taginfo-fl-r02/NT_a1103.txt.gz, scratch/taginfo-fl-r02/NT_a1102.txt.gz, scratch/taginfo-fl-r02/NT_a1114.txt.gz, scratch/taginfo-fl-r02/NT_a1109.txt.gz, scratch/taginfo-fl-r02/NT_a2112.txt.gz, scratch/taginfo-fl-r02/NT_a2106.txt.gz, scratch/taginfo-fl-r02/NT_a1110.txt.gz
output: taginfo/NT.txt.gz, scratch/stats/perfdup-traces-NT.txt.gz
jobid: 3
wildcards: sample=NT
threads: 12
Could line parse line 0: a1101 6177 514 -1 Error in rule merge_and_deduplicate_taginfo:
jobid: 0
output: scratch/stats/perfdup-traces-NT.txt.gz, taginfo/NT.txt.gz
RuleException:
CalledProcessError in line 278 of /home/user/opt/tailseeker/3.1.7/tailseeker/main.py:
Command ' set -e; set -o pipefail; export PYTHONPATH="/home/user/opt/tailseeker/3.1.7" LC_ALL=C BGZIP_CMD="/usr/local/apps/samtools/1.6/bin/bgzip" TABIX_CMD="/usr/local/apps/samtools/1.6/bin/tabix" TAILSEQ_SCRATCH_DIR="/spin1/home/linux/user/scratch" PATH="/home/user/opt/tailseeker/3.1.7/bin:/home/user/opt/AYB2/2.11/bin:/home/user/opt/tailseeker/3.1.7/bin:/usr/local/apps/gmap-gsnap/2017-09-05/bin:/usr/local/apps/seqtk/1.2-r94:/usr/local/apps/bedtools/2.27.1/bin:/usr/local/apps/samtools/1.6/bin:/usr/local/apps/STAR/2.5.3a/bin:/usr/local/apps/snakemake/4.5.1/bin:/usr/local/apps/parallel/20170422/bin:/usr/local/Anaconda/envs/py3.5/bin:/usr/local/apps/coreutils/8.27/bin:/usr/local/slurm/bin:/usr/local/slurm/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/mysql/bin:/usr/X11R6/bin:/usr/local/jdk/bin:/usr/local/bin:/bin:/usr/bin:/home/user/bin:/opt/ibutils/bin" LD_LIBRARY_PATH="/home/user/opt/tailseeker/3.1.7/lib:/usr/local/LAPACK/3.7.1-gcc-4.4.7/lib64:/usr/local/zlib/1.2.8/lib:/usr/local/apps/gmap-gsnap/2017-09-05/lib:/usr/local/apps/samtools/1.5/lib" ; zcat scratch/taginfo-fl-r02/NT_a1101.txt.gz scratch/taginfo-fl-r02/NT_a1102.txt.gz scratch/taginfo-fl-r02/NT_a1103.txt.gz scratch/taginfo-fl-r02/NT_a1104.txt.gz scratch/taginfo-fl-r02/NT_a1105.txt.gz scratch/taginfo-fl-r02/NT_a1106.txt.gz scratch/taginfo-fl-r02/NT_a1107.txt.gz scratch/taginfo-fl-r02/NT_a1108.txt.gz scratch/taginfo-fl-r02/NT_a1109.txt.gz scratch/taginfo-fl-r02/NT_a1110.txt.gz scratch/taginfo-fl-r02/NT_a1111.txt.gz scratch/taginfo-fl-r02/NT_a1112.txt.gz scratch/taginfo-fl-r02/NT_a1113.txt.gz scratch/taginfo-fl-r02/NT_a1114.txt.gz scratch/taginfo-fl-r02/NT_a1115.txt.gz scratch/taginfo-fl-r02/NT_a1116.txt.gz scratch/taginfo-fl-r02/NT_a2101.txt.gz scratch/taginfo-fl-r02/NT_a2102.txt.gz scratch/taginfo-fl-r02/NT_a2103.txt.gz scratch/taginfo-fl-r02/NT_a2104.txt.gz scratch/taginfo-fl-r02/NT_a2105.txt.gz scratch/taginfo-fl-r02/NT_a2106.txt.gz scratch/taginfo-fl-r02/NT_a2107.txt.gz scratch/taginfo-fl-r02/NT_a2108.txt.gz scratch/taginfo-fl-r02/NT_a2109.txt.gz scratch/taginfo-fl-r02/NT_a2110.txt.gz scratch/taginfo-fl-r02/NT_a2111.txt.gz scratch/taginfo-fl-r02/NT_a2112.txt.gz scratch/taginfo-fl-r02/NT_a2113.txt.gz scratch/taginfo-fl-r02/NT_a2114.txt.gz scratch/taginfo-fl-r02/NT_a2115.txt.gz scratch/taginfo-fl-r02/NT_a2116.txt.gz | env BGZIP_OPT="-@ 12" sort -t " " -k6,6 -k1,1 -k2,2n --compress-program=/home/user/opt/tailseeker/3.1.7/bin/bgzip-wrap --parallel=12 | /home/user/opt/tailseeker/3.1.7/bin/tailseq-dedup-perfect scratch/stats/perfdup-traces-NT.txt.gz | env BGZIP_OPT="-@ 12" sort -t " " -k1,1 -k2,2n --compress-program=/home/user/opt/tailseeker/3.1.7/bin/bgzip-wrap --parallel=12 | /usr/local/apps/samtools/1.6/bin/bgzip -@ 12 -c > taginfo/NT.txt.gz ' returned non-zero exit status 141
File "/home/user/opt/tailseeker/3.1.7/tailseeker/main.py", line 278, in __rule_merge_and_deduplicate_taginfo
File "/usr/local/Anaconda/envs/py3.5/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I looked into where the message Could line parse line 0: a1101 6177 514 -1
is coming from, and it turns out that the scratch/taginfo-fl-r02/NT_*
files each have their contents collapsed into one-line. There're no trailing newlines either, so the concatenation of them is also one-line. So the error seems to be coming from an earlier step. Do you have an idea on what could be going on?
These are the software versions:
tailseeker/3.1.7
python/3.5
parallel/20170422
snakemake/4.1.0
STAR/2.5.3a
samtools/1.5
bedtools/2.25.0
seqtk/1.2
htslib/1.5
gmap-gsnap/2017-09-05
zlib/1.2.8
and AYB is from your fork. I set all this up a few months ago, around November, but the user just recently got the right data to use here.
Ok, it looks like I get past this error with the latest master. Thanks!