Abrubt Termination (Without any error) on Google Colab, AWS EC2

Question

Abrubt Termination (Without any error) on Google Colab, AWS EC2

G999n opened this issue 3 months ago · 4 comments

The conversion process abruptly terminates at random intervals in the Detecting Boxes Stage on Google Colab and AWS EC2 instance (WIndows). The percentage value varies randomly.

AWS EC2

Google Colab

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

Answer 1 · 2024-08-26T13:40:55.000Z

what was the file size? how many pages? i can be that the instances runs out of memory.

Answer 2 · 2024-08-26T18:57:25.000Z

what was the file size? how many pages? i can be that the instances runs out of memory.

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

The file size is 15.6 MB.
As per the instructions, I was using freeRAM//3 as the batch_multiplier
--batch_multiplier 3 on Colab (which had 11 GB of free RAM)
--batch_multiplier 2 (and then tried 1 too) on AWS EC2 (which had 8 GB of RAM)
However, both of the above were CPU instances. I wasn't using any GPU in colab or EC2.

The conversion worked fine on vast.ai's jupyter lab instance with RTX 4090 (24 GB VRAM) and 32 GB RAM. I had used --batch_multiplier 7 here.

Apart from the memory required for the batches (which is ~3GB per batch), I had assumed that a minimal memory will be required by the program that would be constant regardless of the pdf size. Is it not the case?

Answer 3 · 2024-08-27T06:03:59.000Z

The vram is limited and will not go up with page size, but you ram will.

A workaround would be slicing your pdf with PyMuPDF in smaller batches and merging the results.

Answer 4 · 2024-08-27T16:29:48.000Z

All right
Thanks a lot