runtime stats

Question

runtime stats

Closed this issue 3 years ago · 9 comments

It would be great if the summary includes runtime stats as well.

Problem here is that the summary text file contains the start_time as offset value from the real date/time value provided in the fastq file. Hence, the total runtime computed out of the values in the summary file is not correct because for each run_id (mux scan, restarts, etc.) the offset starts at 0 again.

Unfortunately, nanostat does not accept both the summary and the fastq file at the same time.

Answer 1 · 2018-08-31T06:58:16.000Z

I'm not sure which runtime stats you would like.

The summary files I have here, generated by basecalling with guppy, are correct.

Answer 2 · 2018-08-31T07:19:49.000Z

I'm not sure which runtime stats you would like.

start and stop time, total runtime (stop - start), and usage time (sum of runtime per run_id).

The summary files I have here, generated by basecalling with guppy, are correct.

I have summary files produced by albacore v2.3.1. They contain real numbers in the start_time column. In the fastq header the start_time field contains timestamps.

Based on the real numbers at least the usage time (runtime of all run_ids) can be computed.

Answer 3 · 2018-08-31T07:31:11.000Z

You are handling the different data types of the start_time field in nanoplot already. But the problem exists their too, which the following two graphs show:

produced with NanoPlot --summary sequencing_summary.txt ...

produced with NanoPlot --fastq_minimal ...:

I know, this examples are not related to NanoStat but to your base functions in nanopack in general.

Answer 4 · 2018-08-31T07:39:38.000Z

Hence I suggest adding at least the parameter --fastq_minimal to NanoStat an additional parameter, not mutual exclusive (for NanoPlot as well) to --summary and parsing the timestamps from the fastq file.

Answer 5 · 2018-08-31T08:01:40.000Z

Did you run basecalling twice, once per folder? I create plots from hundreds of summary files (PromethION) and the time information is correct: the start_time in the next summary file starts where the previous has stopped.

Answer 6 · 2018-08-31T12:28:40.000Z

No, I ran albacore only once. Maybe the reason for this case is that the MinION run was interrupted. After restarting it, a new mux and sequencing run was created. I assume that albacore computes the real number in start_time of sequencing_summery.txt based on the start time of each sequencing run. Do you consider this? pycoQC does it.

Answer 7 · 2018-08-31T14:19:59.000Z

Ah, yes, if the run was restarted I can imagine the sequencing_summary.txt is not correct. I'm not sure how pycoQC solves this?

Answer 8 · 2018-09-04T18:13:11.000Z

pycoQC solves this by grouping over the runids. Indeed, it is impossible to sort the runs, just order by size or runtime.

Answer 9 · 2018-09-04T18:33:34.000Z

I have put 'adding run_time metrics to nanostat report' to my to-do list, but as I'm writing my thesis it gets a fairly low priority for now.

For the other problem, using the fastq_minimal or fastq_rich input is going to be the way forward. I don't intend to make changes to how summaries are parsed in the near future.