py-pdf/pypdf_table_extraction

improve how PDFHandler caches single page of pdf

Opened this issue · 1 comments

bosd commented

Accross the repos the where some PR's about for the usage of tempfiles.

From: camelot-dev#487

Change the way to share and clean up temp directory.

The WITH clause is contagious. Temp directory cannot be shared across an instance of PDFHandler unless the signature of init is changed. It turns the upper layer's duty to clean up the directory. To hide the implementation details, use finalizers to clean up.

Add _get_temp_path to make sure to access tmp pdf file in the same way.

Hide implementation details. We can reuse the temp pdf after calling parse() now.

Update _save_page parameters to meet the change.

Use properties instead.

bosd commented

Made some tests to forward port that pr.
The downside of the fix is that the temp folder is actually cleanup.
So it is unavailable for plotting. Resulting in the plotting tests to fail.

Someone alternative idead/ suggestions?