pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

Question

pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

Closed this issue 2 years ago · 2 comments

I run cDNA_MINES using the sample data you provided,
1) HEK293.fraction_modified_reads.plus_chrX.wig.bed (for fraction_modified parameter),
2) HEK293.coverage.plus_chrX.bedgraph (for coverage parameter),
3) GGACT_9_random_forest_model.pickle (for k-mer parameter, I just pick one from Final_models)

The 1) wig.bed file was generated by wig2bed --multisplit bar < HEK293.fraction_modified_reads.plus_chrX.wig > HEK293.fraction_modified_reads.plus_chrX.wig.bed

$ python cDNA_MINES.py --fraction_modified HEK293.fraction_modified_reads.plus_chrX.wig.bed --coverage HEK293.coverage.plus_chrX.bedgraph --output m6A_try.bed --ref /project/umw_chan_zhou/Data/GRCh37_hg19/hg19.fa --kmer_models /home/ek81w/Euijin/MINES-master/Final_Models/GGACT_9_random_forest_model.pickle
Traceback (most recent call last):
  File "cDNA_MINES.py", line 143, in <module>
    model_list = pd.read_csv(args.kmer_models,  header=None, names=['file'])
  File "/nl/umw_chan_zhou/Euijin/shared_env/MINES_20210812/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/nl/umw_chan_zhou/Euijin/shared_env/MINES_20210812/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/nl/umw_chan_zhou/Euijin/shared_env/MINES_20210812/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/nl/umw_chan_zhou/Euijin/shared_env/MINES_20210812/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

I got this type of error. Could you help me to fix this problem?
Thank you!

Answer 1 · 2023-03-21T15:35:28.000Z

Hi， I have the same problem as you, how did you solve it? Can you tell me? Thank you very much

Answer 2 · 2023-03-21T17:08:23.000Z

Hi @Akihiqq

I barely remember how to solve this issue. I think I removed the conda environment for MINES and scripts and re-installed it. You may try this first.
Hopefully, it works for you!