VikParuchuri/marker

Abrubt Termination (Without any error) on Google Colab, AWS EC2

G999n opened this issue · 4 comments

The conversion process abruptly terminates at random intervals in the Detecting Boxes Stage on Google Colab and AWS EC2 instance (WIndows). The percentage value varies randomly.

AWS EC2
image

Google Colab
image

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

what was the file size? how many pages? i can be that the instances runs out of memory.

what was the file size? how many pages? i can be that the instances runs out of memory.

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

The file size is 15.6 MB.
As per the instructions, I was using freeRAM//3 as the batch_multiplier
--batch_multiplier 3 on Colab (which had 11 GB of free RAM)
--batch_multiplier 2 (and then tried 1 too) on AWS EC2 (which had 8 GB of RAM)
However, both of the above were CPU instances. I wasn't using any GPU in colab or EC2.

The conversion worked fine on vast.ai's jupyter lab instance with RTX 4090 (24 GB VRAM) and 32 GB RAM. I had used --batch_multiplier 7 here.

Apart from the memory required for the batches (which is ~3GB per batch), I had assumed that a minimal memory will be required by the program that would be constant regardless of the pdf size. Is it not the case?

The vram is limited and will not go up with page size, but you ram will.

A workaround would be slicing your pdf with PyMuPDF in smaller batches and merging the results.

All right
Thanks a lot