Report not generated: error tokenizing data

Question

Report not generated: error tokenizing data

Closed this issue 6 months ago · 4 comments

Hello I am running Nanoplot on ONT data aligned with minimap2 and I get html files for the plots but the report fails to be generated. I append the command and the log.

NanoPlot \
    -t 112 \
    -o genomic_nanoplot \
    -p ${SAMPLENAME}_nanoplot \
    --bam $INPUT

2024-07-02 12:18:15,967 NanoPlot 1.43.0 started with arguments Namespace(threads=112, verbose=False, store=False, raw=False, huge=False, outdir='genomic_nanoplot', no_static=False, prefix='sample', tsv_stats=False, only_report=False, info_in_report=False, maxlength=None, minlength=None, drop_outliers=False, downsample=None, loglength=False, percentqual=False, alength=False, minqual=None, runtime_until=None, readtype='1D', barcoded=False, no_supplementary=False, color='#4CB391', colormap='Greens', format=['png'], plots=['kde', 'dot'], legacy=None, listcolors=False, listcolormaps=False, no_N50=False, N50=False, title=None, font_scale=1, dpi=100, hide_stats=False, fastq=None, fasta=None, fastq_rich=None, fastq_minimal=None, summary=None, bam=['/sample.bam'], ubam=None, cram=None, pickle=None, feather=None, path='sample')
2024-07-02 12:18:15,967 Python version is: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0]
2024-07-02 12:18:15,985 Nanoget: Starting to collect statistics from bam file sample.bam.
2024-07-02 12:18:16,092 Nanoget: Bam file sample.bam contains 20481891 mapped and 0 unmapped reads.
2024-07-02 12:18:16,092 Nanoget: lots of contigs (>200) or --huge, not running in separate processes
2024-07-02 12:31:37,224 Nanoget: bam sample.bam contains 20481891 primary alignments.
2024-07-02 12:31:44,292 Reduced DataFrame memory usage from 2657.968638420105Mb to 2657.968638420105Mb
2024-07-02 12:31:52,691 Nanoget: Gathered all metrics of 20481891 reads
2024-07-02 12:32:19,098 Calculated statistics
2024-07-02 12:32:19,101 Using sequenced read lengths for plotting.
2024-07-02 12:32:20,667 NanoPlot: Valid color #4CB391.
2024-07-02 12:32:20,667 NanoPlot: Valid colormap Greens.
2024-07-02 12:32:21,748 NanoPlot: Creating length plots for Read length.
2024-07-02 12:32:21,757 NanoPlot: Using 20481891 reads maximum of 8346bp.
2024-07-02 12:32:56,238 No static plots are saved due to some kaleido problem:
2024-07-02 12:32:56,238 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:33:25,709 No static plots are saved due to some kaleido problem:
2024-07-02 12:33:25,709 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:33:51,488 No static plots are saved due to some kaleido problem:
2024-07-02 12:33:51,488 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:34:19,240 No static plots are saved due to some kaleido problem:
2024-07-02 12:34:19,240 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:34:45,907 No static plots are saved due to some kaleido problem:
2024-07-02 12:34:45,907 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:34:45,908 Created length plots
2024-07-02 12:34:47,220 NanoPlot: Creating Read lengths vs Average read quality plots using 20481891 reads.
2024-07-02 12:35:13,101 No static plots are saved due to some kaleido problem:
2024-07-02 12:35:13,104 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:35:37,430 No static plots are saved due to some kaleido problem:
2024-07-02 12:35:37,430 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:35:37,434 Created LengthvsQual plot
2024-07-02 12:35:38,737 NanoPlot: Creating Aligned read lengths vs Sequenced read length plots using 20481891 reads.
2024-07-02 12:36:03,672 No static plots are saved due to some kaleido problem:
2024-07-02 12:36:03,673 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:36:27,415 No static plots are saved due to some kaleido problem:
2024-07-02 12:36:27,415 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:36:27,418 Created AlignedLength vs Length plot.
2024-07-02 12:36:27,418 NanoPlot: Creating Read mapping quality vs Average basecall quality plots using 20481891 reads.
2024-07-02 12:36:52,200 No static plots are saved due to some kaleido problem:
2024-07-02 12:36:52,200 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:37:15,871 No static plots are saved due to some kaleido problem:
2024-07-02 12:37:15,871 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:37:15,871 Created MapQvsBaseQ plot.
2024-07-02 12:37:17,180 NanoPlot: Creating Read length vs Read mapping quality plots using 20481891 reads.
2024-07-02 12:37:41,901 No static plots are saved due to some kaleido problem:
2024-07-02 12:37:41,901 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:38:04,791 No static plots are saved due to some kaleido problem:
2024-07-02 12:38:04,791 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:38:04,793 Created Mapping quality vs read length plot.
2024-07-02 12:38:05,083 NanoPlot: Creating Percent identity vs Average Base Quality plots using 20481891 reads.
2024-07-02 12:38:29,173 No static plots are saved due to some kaleido problem:
2024-07-02 12:38:29,174 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:38:51,008 No static plots are saved due to some kaleido problem:
2024-07-02 12:38:51,008 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:38:51,008 Created Percent ID vs Base quality plot.
2024-07-02 12:38:52,316 NanoPlot: Creating Aligned read length vs Percent identity plots using 20481891 reads.
2024-07-02 12:39:11,933 No static plots are saved due to some kaleido problem:
2024-07-02 12:39:11,934 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:39:28,099 No static plots are saved due to some kaleido problem:
2024-07-02 12:39:28,099 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:39:45,990 No static plots are saved due to some kaleido problem:
2024-07-02 12:39:45,991 Transform failed with error code 1: Failed to serialize document: Uncaught
2024-07-02 12:39:45,991 Created Percent ID vs Length plot
2024-07-02 12:39:45,991 Writing html report.
2024-07-02 12:39:45,993 Error tokenizing data. C error: Expected 2 fields in line 21, saw 3
Traceback (most recent call last):
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 111, in main
make_report(plots, settings)
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/NanoPlot.py", line 388, in make_report
report.html_stats(settings),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/report.py", line 45, in html_stats
stats_html.append(stats2html(statsfile[0]))
^^^^^^^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/nanoplot/report.py", line 50, in stats2html
df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "utils/conda_envs/nanoplot/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 3

Answer 1 · 2024-07-02T12:58:29.000Z

This seems the same issue as in wdecoster/nanocomp#76
Was your data run through pychopper? Or are those Duplex reads?

Could you check if --tsv_stats solves this?

Answer 2 · 2024-07-02T13:21:48.000Z

These reads have been basecalled in duplex and have been processed with porechop (not pychopper in this case). I'll try with --tsv_stats and let you know. Thanks for the quick answer!

Answer 3 · 2024-07-02T13:39:59.000Z

It did work enabling --tsv_stats

Answer 4 · 2024-07-02T16:28:55.000Z

Thanks for the feedback!