biigle/reports

Split full report if one file would have too many rows

Opened this issue · 5 comments

mzur commented

Reject a full report request if it should contain more than 50000 annotations for a single volume. Even though the full report Python script is now more memory efficient, such a report can generate extremely large temporary files (>1.5 GB for a volume with 80000 annotations) and can contain more than the number of rows that can be handled by Excel or Calc (if freehand polygons are used).

mzur commented

Even better would be a dynamic limit. Find out what the maximum number of rows is that Excel/Calc can handle. Then split the report into multiple files if the number of rows in a single file would be too large.

According to Microsoft Support it is 1,048,576 rows by 16,384 columns

mzur commented

Your reference suggests that we could split the report into multiple worksheets instead of multiple files. This would be much easier. However, I think we do the worksheet split for some other cases, too (split by label tree?). I can't recall exactly.

mzur commented

There is no easy fix for this. This could be solved in three different ways but all are not straight forward:

  1. Split the XLSX in different files: There is no concept of a single report that consists of multiple files, so this would require significant work.
  2. Use multiple worksheets: Worksheets are already used if the report should be split by label tree or user.
  3. Deny request if report would be too big: "Too big" depends on the number of annotations and on the number of annotation coordinates. A report could contain 1M point annotations or only a few thousand freehand polygon annotations, so a hard limit for the number of annotations does not really make sense. Validation of a request that checks the number of annotation coordinates would be quite slow and/or complex, I think (count the commas in the points column?).
mzur commented

Another idea: Change the report to contain the array of coordinates in a single cell (like the CSV report). Offer a checkbox that makes the old behavior opt-in for backwards compatibility. Communicate this to the users.