IndexError occurred when processing directory /app/output: index out of range
Closed this issue · 4 comments
I run the code in docker python:3.10
import os
import gzip
import shutil
from SigProfilerExtractor import sigpro as sig
from debug import list_files # 导入debug.py中的函数
import cyvcf2
def check_vcf_file(file_path):
print(f"Checking file: {file_path}") # 输出正在检查的文件名
try:
vcf_file = cyvcf2.VCF(file_path)
print(f"File {file_path} is a valid VCF file.") # 如果文件是有效的VCF文件,输出确认信息
return True
except ValueError as e:
print(f"Error occurred when checking file {file_path}: {e}") # 如果在检查文件时出现错误,输出错误信息
return False
def decompress_vcf_gz_files(input_dir, output_dir):
for root, dirs, files in os.walk(input_dir):
vcf_gz_files = [file for file in files if file.endswith(".vcf.gz")]
for file in vcf_gz_files:
full_file_path = os.path.join(root, file)
print(f"开始解压文件:{full_file_path}")
with gzip.open(full_file_path, 'rb') as f_in:
with open(os.path.join(output_dir, file[:-3]), 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
os.remove(full_file_path)
print(f"文件解压完成并已删除原始文件:{full_file_path}")
# 在解压完成后,调用list_files函数来检查/app/output文件夹
print("Files and directories in /app/output:")
list_files("/app/output")
def extract_signals(vcf_dir, output_dir):
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for root, dirs, files in os.walk(vcf_dir):
vcf_files = [file for file in files if file.endswith(".vcf")]
for file in vcf_files:
full_file_path = os.path.join(root, file)
if check_vcf_file(full_file_path):
print(f"开始提取目录:{root} 中的突变信息")
try:
print(f"Calling sig.sigProfilerExtractor with arguments: 'vcf', '{output_dir}', '{root}'")
sig.sigProfilerExtractor("vcf", output_dir, root)
print(f"完成提取目录:{root} 中的突变信息")
except IndexError as e:
print(f"IndexError occurred when processing directory {root}: {e}")
print(f"Error occurred at file: {full_file_path}")
except Exception as e:
print(f"An unexpected error occurred when processing directory {root}: {e}")
print(f"Error occurred at file: {full_file_path}")
else:
print(f"文件 {full_file_path} 不是有效的 .vcf 文件")
print("所有文件提取完成")
# 使用示例
decompress_vcf_gz_files("/app/data", "/app/output")
# 输出 /app/data 目录中的文件和子目录,并计数
print("Files and directories in /app/data:")
list_files("/app/data")
extract_signals("/app/output", "/app/extract_signals")
it print out :"Starting matrix generation for SNVs and DINUCs...IndexError occurred when processing directory /app/output: index out of range" for every vcf files
how can i fix it
We do not support SigProfilerExtractor as a docker image currently. This looks like it is an issue with SigProfilerMatrixGenerator.
Could you please provide more context in the message? The line number would be a great help as well as all the tool versions that are installed.
Thanks!
Hello professor @mdbarnesUCSD , waiting for your help. I cannot handle this problem.
i create a docker based on python:3.11 and install the required package
# Use Anaconda as base image
FROM python:3.11
WORKDIR /app
# Change pip source to Aliyun
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install Python packages
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy the project into the Docker image
COPY . .
the packages' versions are as following
pip>=24.0
SigProfilerExtractor>=1.1.0
SigProfilerAssignment>=0.1.4
SigProfilerMatrixGenerator>=1.2.0
sigProfilerPlotting>=1.3.0
matplotlib>=3.4.0
statsmodels>=0.12.2
Bio>=0.1.0
gprofiler-official>=1.0.0
cyvcf2>=0.30.20
SigProfilerSimulator>=1.1.0
then i run the pythonb code
when it gets the
sig.sigProfilerExtractor("vcf", "/app/example_output", vcf_files, minimum_signatures=1, maximum_signatures=3)
the program stops in an accident
the log file prints
Traceback (most recent call last):
File "/opt/project/完整运行.py", line 125, in <module>
main_function()
File "/opt/project/完整运行.py", line 91, in main_function
sig.sigProfilerExtractor("vcf", "example_output", vcf_files, minimum_signatures=None, maximum_signatures=None)
File "/usr/local/lib/python3.11/site-packages/SigProfilerExtractor/sigpro.py", line 544, in sigProfilerExtractor
data = datadump.SigProfilerMatrixGeneratorFunc(project_name, refgen, project, exome=exome, bed_file=None, chrom_based=False, plot=False, gs=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/SigProfilerMatrixGeneratorFunc.py", line 1318, in SigProfilerMatrixGeneratorFunc
file_name = vcf_files[0].split(".")
~~~~~~~~~^^^
IndexError: list index out of range
waiting for your hearing.
Sincerely!
plus, my input are similar to this"5e8f048c-5f9a-48f3-9d3e-1bbb6094606f.wxs.pindel.raw_somatic_mutation.vcf"
This is an error from SigProfilerMatrixGenerator that shows up when the directory contains no VCF files and I suspect this is an issue specific to your docker container not having access the files in the input directory.
I am going to close this issue because SigProfilerExtractor does not yet support Docker.