AlexandrovLab/SigProfilerExtractor

IndexError occurred when processing directory /app/output: index out of range

Closed this issue · 4 comments

I run the code in docker python:3.10

import os
import gzip
import shutil
from SigProfilerExtractor import sigpro as sig
from debug import list_files  # 导入debug.py中的函数

import cyvcf2

def check_vcf_file(file_path):
    print(f"Checking file: {file_path}")  # 输出正在检查的文件名
    try:
        vcf_file = cyvcf2.VCF(file_path)
        print(f"File {file_path} is a valid VCF file.")  # 如果文件是有效的VCF文件,输出确认信息
        return True
    except ValueError as e:
        print(f"Error occurred when checking file {file_path}: {e}")  # 如果在检查文件时出现错误,输出错误信息
        return False




def decompress_vcf_gz_files(input_dir, output_dir):
    for root, dirs, files in os.walk(input_dir):
        vcf_gz_files = [file for file in files if file.endswith(".vcf.gz")]
        for file in vcf_gz_files:
            full_file_path = os.path.join(root, file)
            print(f"开始解压文件:{full_file_path}")
            with gzip.open(full_file_path, 'rb') as f_in:
                with open(os.path.join(output_dir, file[:-3]), 'wb') as f_out:
                    shutil.copyfileobj(f_in, f_out)
            os.remove(full_file_path)
            print(f"文件解压完成并已删除原始文件:{full_file_path}")

    # 在解压完成后,调用list_files函数来检查/app/output文件夹
    print("Files and directories in /app/output:")
    list_files("/app/output")
def extract_signals(vcf_dir, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    for root, dirs, files in os.walk(vcf_dir):
        vcf_files = [file for file in files if file.endswith(".vcf")]
        for file in vcf_files:
            full_file_path = os.path.join(root, file)
            if check_vcf_file(full_file_path):
                print(f"开始提取目录:{root} 中的突变信息")
                try:
                    print(f"Calling sig.sigProfilerExtractor with arguments: 'vcf', '{output_dir}', '{root}'")
                    sig.sigProfilerExtractor("vcf", output_dir, root)
                    print(f"完成提取目录:{root} 中的突变信息")
                except IndexError as e:
                    print(f"IndexError occurred when processing directory {root}: {e}")
                    print(f"Error occurred at file: {full_file_path}")
                except Exception as e:
                    print(f"An unexpected error occurred when processing directory {root}: {e}")
                    print(f"Error occurred at file: {full_file_path}")
            else:
                print(f"文件 {full_file_path} 不是有效的 .vcf 文件")
    print("所有文件提取完成")


# 使用示例
decompress_vcf_gz_files("/app/data", "/app/output")

# 输出 /app/data 目录中的文件和子目录,并计数
print("Files and directories in /app/data:")
list_files("/app/data")

extract_signals("/app/output", "/app/extract_signals")

it print out :"Starting matrix generation for SNVs and DINUCs...IndexError occurred when processing directory /app/output: index out of range" for every vcf files
how can i fix it

Hi @xiaoyaojianghuzai,

We do not support SigProfilerExtractor as a docker image currently. This looks like it is an issue with SigProfilerMatrixGenerator.

Could you please provide more context in the message? The line number would be a great help as well as all the tool versions that are installed.

Thanks!

Hello professor @mdbarnesUCSD , waiting for your help. I cannot handle this problem.
i create a docker based on python:3.11 and install the required package

# Use Anaconda as base image
FROM python:3.11
WORKDIR /app

# Change pip source to Aliyun
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

# Install Python packages
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy the project into the Docker image
COPY . .

the packages' versions are as following

pip>=24.0
SigProfilerExtractor>=1.1.0
SigProfilerAssignment>=0.1.4
SigProfilerMatrixGenerator>=1.2.0
sigProfilerPlotting>=1.3.0
matplotlib>=3.4.0
statsmodels>=0.12.2
Bio>=0.1.0
gprofiler-official>=1.0.0
cyvcf2>=0.30.20
SigProfilerSimulator>=1.1.0

then i run the pythonb code
when it gets the

sig.sigProfilerExtractor("vcf", "/app/example_output", vcf_files, minimum_signatures=1, maximum_signatures=3)

the program stops in an accident
the log file prints

Traceback (most recent call last):
  File "/opt/project/完整运行.py", line 125, in <module>
    main_function()
  File "/opt/project/完整运行.py", line 91, in main_function
    sig.sigProfilerExtractor("vcf", "example_output", vcf_files, minimum_signatures=None, maximum_signatures=None)
  File "/usr/local/lib/python3.11/site-packages/SigProfilerExtractor/sigpro.py", line 544, in sigProfilerExtractor
    data = datadump.SigProfilerMatrixGeneratorFunc(project_name, refgen, project, exome=exome,  bed_file=None, chrom_based=False, plot=False, gs=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/SigProfilerMatrixGenerator/scripts/SigProfilerMatrixGeneratorFunc.py", line 1318, in SigProfilerMatrixGeneratorFunc
    file_name = vcf_files[0].split(".")
                ~~~~~~~~~^^^
IndexError: list index out of range

waiting for your hearing.
Sincerely!

plus, my input are similar to this"5e8f048c-5f9a-48f3-9d3e-1bbb6094606f.wxs.pindel.raw_somatic_mutation.vcf"

Hi @xiaoyaojianghuzai,

This is an error from SigProfilerMatrixGenerator that shows up when the directory contains no VCF files and I suspect this is an issue specific to your docker container not having access the files in the input directory.

I am going to close this issue because SigProfilerExtractor does not yet support Docker.