yfukasawa/LongQC

pandas.errors.EmptyDataError: No columns to parse from file

Opened this issue Β· 16 comments

Hi,

Thanks for the great work.

I experience a similar issue as described here #28 and here #34.

longQC:2021-10-27 08:06:14,443:598:INFO:Generating coverage related plots...
Traceback (most recent call last):
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 956, in <module>
    main(args)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 62, in main
    args.handler(args)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 602, in command_sample
    lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/lq_coverage.py", line 88, in __init__
    self.df = pd.read_table(table_path, sep='\t', header=None, dtype={3: str, 4: str})
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 683, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

However, I don't think it's a memory issue. I already reduced the index size to 100M. The peak RSS is 6.7G and 22.7G during the spiked-in control, which seems to run through normal. I requested 64G of Ram, which is why I don't think memory is the issue here. This is the command I used to execute the pipeline:

python ${home_dir}LongQC/longQC.py sampleqc -o ${home_dir}scratch/QC/ -i 100M -x pb-sequel --sample_name gbl -m 1 -p 64 ${home_dir}scratch/gbl.subreads.bam

The coverage_out.txt file is empty, causing the error. I attached the coverage_err.txt file, the log file, and the files corresponding to the spiked-in control:

coverage_err_gbl.txt
qc.log
spiked_in_control_gbl.txt
spiked_in_control_gbl_stderr.txt

Any thoughts on this?

Thanks,
Aaron

Hi @AaronRuben,

Thank you for your interests in our tool and also for sharing log files. It is indeed helpful to understand the issue.
Regarding the issue, as you mentioned, I agree - it doesn't look the RAM issue.

Due to unexpected reason(s), I think the program crushed while it wrote coverage_out.txt.
Can I ask which version/commit of LongQC was used?
Also, how's the outcome if you reduce -p? say if you reduce it to 24.

Thank you,
Yoshinori

Hi @yfukasawa,

Thanks for the quick response. I use version 1.2.0c. I will try submitting the job with less CPUs.

Thanks,
Aaron

Hi @yfukasawa,

Thanks for the quick response. I use version 1.2.0c. I will try submitting the job with less CPUs.

Thanks, Aaron

Update: I get the exact same error when running it 16 CPUs. The coverage_out.txt is empty...

same issue here, using a 64-cores 700GB ram server.
edit: same error using version 1.2.0c and git clone version

Hi @yfukasawa --

seems I ran probably into the same issue.

longQC.py sampleqc \
  --output qc_out_f \
  --preset ont-rapid \
  --sample_name 0038 \
  --ncpu 40 \
  --mem 2 \
  FC-ID_FLO-MIN106-SQK-ULK001_0038.pass.fq

results in:

longQC:2022-03-10 16:19:45,212:340:INFO:Adapter search has done for a chunk 1.
longQC:2022-03-10 16:19:45,494:344:INFO:subsample finished for chunk 1.
longQC:2022-03-10 16:19:45,495:364:INFO:Input file parsing was finished. #seqs:44557, #bases: 1944515509
lq_mask:2022-03-10 16:19:45,495:114:INFO:Waiting completion of all of jobs...
lq_mask:2022-03-10 16:19:45,634:117:INFO:sdust jobs finished.
lq_mask:2022-03-10 16:19:45,643:87:INFO:sdust output file qc_out_f/longqc_sdust__0038.txt was made.
lq_mask:2022-03-10 16:19:45,976:93:INFO:tmp file qc_out_f/analysis/tmp_0.fastq was removed.
lq_mask:2022-03-10 16:19:46,167:93:INFO:tmp file qc_out_f/analysis/tmp_1.fastq was removed.
lq_mask:2022-03-10 16:19:46,176:93:INFO:tmp file qc_out_f/analysis/tmp_0__0038.txt was removed.
lq_mask:2022-03-10 16:19:46,184:93:INFO:tmp file qc_out_f/analysis/tmp_1__0038.txt was removed.
longQC:2022-03-10 16:19:46,184:368:INFO:Summary table qc_out_f/longqc_sdust__0038.txt was made.
Traceback (most recent call last):
  File "/path/to/bin/longQC.py", line 958, in <module>
    main(args)
  File "/path/to/bin/longQC.py", line 64, in main
    args.handler(args)
  File "/path/to/bin/longQC.py", line 371, in command_sample
    df_mask      = pd.read_table(lm.get_outfile_path(), sep='\t', header=None)
  File "/path/to/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/path/to/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 779, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/path/to/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/path/to/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/path/to/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1231, in _make_engine
    return mapping[engine](f, **self.options)
  File "/path/to/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 75, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

System: 1TB RAM, 80 cores, plenty of fastq disk space, LongQC 1.2.0c ..

Any ideas?

I tried running both a conda installation and docker installation of LongQC. I'm not quite sure if this applies to everyone, but only the conda installation terminates with this error (no content in output directory .txt files to speak of - either the files aren't generated or they're removed).

Docker installation of LongQC completes analysis of the same dataset using the same arguments without any issues whatsoever.

Hope this is helpful to someone!

Hi,

i am also getting this bug with bothe the docker container and the conda install, but only using the -ont-ligation flag. Using the -pb-hifi flag it works fine.

I'm using the latest docker version and get this error with -ont-ligation as well. Interestingly, it happened with my large dataset, but my small test dataset (about 500 reads) runs fine.

Hello @yfukasawa,
I have the same error message as @sklages. I have the conda version, 32GB of RAM, 12 cpus, so I tried with a subsample of 4000 sequences of cDNA transcripts (representing <6MB). I also tried the option -i 0.1 to reduce the index size.

Nothing works, I get the same error message just after the creation of the summary table 'longqc_sdust.txt'.

Did you find what leads to this error ?

Thanks,

Aline
longqc_error_message.txt

Hello @yfukasawa

I have the same issue as @sklages and @Aline-Git. Are there any update on the issue? Any idea how to solve this?

Best regards,
Nikolaj

Hi,

I am also having the same issue using the conda-installed version.

Cheers,
Paul

Hi all, sorry for not coming back to this point (for such a long time).

Does it works fine under Docker environment?
I personally use conda version more often, but didn't hit this issue for years. mmm...

BTW, I randomly noticed someone made a conda package of LongQC, installation to conda perhaps means using this package?

Y.

I can also weigh in on the sentiment that the conda installation does not appear to work, I fixed this same issue by running in docker using a slightly modified version of the Dockerfile on github (mamba rather than conda).

I believe it was related to minimap2-coverage not being installed properly!

I got the same error.
Traceback (most recent call last):
File "/home/zhu/miniconda3/envs/py37/bin/longQC.py", line 957, in
main(args)
File "/home/zhu/miniconda3/envs/py37/bin/longQC.py", line 63, in main
args.handler(args)
File "/home/zhu/miniconda3/envs/py37/bin/longQC.py", line 370, in command_sample
df_mask = pd.read_table(lm.get_outfile_path(), sep='\t', header=None)
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 683, in read_table
return _read(filepath_or_buffer, kwds)
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/home/zhu/miniconda3/envs/py37/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

Hi all, sorry for not coming back to this point (for such a long time).

Does it works fine under Docker environment? I personally use conda version more often, but didn't hit this issue for years. mmm...

BTW, I randomly noticed someone made a conda package of LongQC, installation to conda perhaps means using this package?

Y.

I am install LongQC by conda. I think this is caused by longQC.py in line 283:

lm = LqMask(os.path.join(path_minimap2, "sdust"), args.out, suffix=suffix, max_n_proc=10 if ncpu > 10 else ncpu)

where path_minimap2 is defined in line 101:

path_minimap2 = os.path.join(os.path.dirname(os.path.abspath(__file__)), "minimap2-coverage")

which is the path of the execute program, so the path of sdust is wrong in line 283.

I installed LongQC from bioconda as well and got similar error as this.

As mentioned by @mbeavitt, I also found this is caused by minimap2-coverage being located in the incorrect directory.

longQC.py will call minimap2-coverage and sdust files in minimap2-coverage subfolder, so the file hierarchy should be as follows:

installed folder
|
└───minimap2-coverage (the folder)
β”‚   β”‚   
β”‚   β”‚   minimap2-coverage (the file)
β”‚   β”‚   sdust
β”‚   β”‚   ...
β”‚  longQC.py
β”‚   ...

but the files in the bioconda package is structured like this:

installed folder
β”‚   longQC.py
β”‚   minimap2-coverage (the file)
β”‚   ...

which is causing the error. Also this package is missing the sdust file.

So I manually installed minimap2-coverage and sdust files in the correct folder hierarchy and I find this fixes the problem.