ajrcarey/pdfium-render

Standardise lifetime and reference handling for child objects inside PdfDocument.

ajrcarey opened this issue · 20 comments

As of 0.7.19, the child objects inside PdfDocument are created and returned using a variety of different approaches:

  • Some immutable-only collections (with no mutable equivalent) are created fresh each time the accessor function is called, and always return a new object instance, e.g. PdfDocument::bookmarks(), PdfDocument::metadata(), PdfDocument::permissions(), PdfDocument::signatures().
  • Some collections that support both immutable and mutable access, and distinguish between mutable and immutable references, create a private owned collection instance inside the containing object as part of the object's new() constructor, then hand out mutable or immutable references to that owned collection as necessary, e.g. PdfDocument::attachments(), PdfDocument::attachments_mut().
  • PdfPages, which supports both immutable and mutable access but does not distinguish between mutable and immutable references, instead handing out a new object instance each time it is called from PdfDocument::pages().

Standardise the approach. All collections should create private owned collection instances inside the containing object as part of the object's new() constructor, function, then hand out mutable or immutable references as necessary. New object instances should never be handed out.

Revised bookmarks, metadata, permissions, and signatures collections in time for 0.7.20.

Additionally, PdfPage::boundaries() has the same problem, in that the child object supports both immutable and mutable access but does not distinguish between these access types, instead handing out a new object instance each time it is called from PdfPage::boundaries(). Reworked as PdfPage::boundaries() and PdfPage::boundaries_mut() in time for 0.7.20.

The separation of PdfDocument::pages() into PdfDocument::pages() and PdfDocument::pages_mut() is the most consequential change with the potential to affect the most users. It is also likely to be the hardest to implement, depending on how many PdfPage child objects expect to be able to take a direct reference to PdfDocument. Save this for 0.7.21.

On reflection, better to save a potentially breaking change like this for 0.8.0. This means that fleshing out the implementation of PdfPageFormFragmentObject should be done first, before completing this issue.

Corrected a few typos in examples. Added PdfAttachment::len() function that returns the data length without requiring a buffer allocation.

Added bindings for all remaining FPDFDest_*() and FPDFLink_*() functions. Implemented native, static, and thread-safe bindings; WASM implementations still pending. Should be ready for inclusion in crate version 0.7.21.

Completed WASM bindings for release 0.7.21.

Fixed a bug in the WASM implementation of FPDFAnnot_GetQuadPoints(). Updated README.md.

Removed unnecessary &mut bindings from various rendering function declarations in PdfBitmap.

Moved commentary about interface differences between native and WASM builds out of README.md into examples/README.md.

Improved implementation of PdfPageImageObject::set_image(). Adjusted signatures of all functions in PdfPageImageObject that take image::DynamicImage so they take a reference to the DynamicImage rather than taking ownership of it. Updated examples/image.rs.

Changed return type of Page::label() from Option<&String> to more idiomatic Option<&str>. Noticed that bindgen no longer generates a type definition for size_t; included this directly in the bindgen module defined in lib.rs.

Updated examples/thread_safe.rs to clearly separate render configuration from render invocation.

#67 deprecates PdfPages::delete_page_at_index() and PdfPages::delete_page_range(). However, if PdfDocument::pages() were to return a &PdfPages / &mut PdfPages reference (rather than the owned PdfPages instance it returns at the moment), and if PdfPages::get() were to return a &PdfPage / &mut PdfPage reference (rather than the owned PdfPage instance it returns at the moment), then it might be safe to reinstate these functions, as Rust would be able to manage the lifetimes safely.

This would require this issue to not only split PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages, but also split PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPages::get_mut() -> &mut PdfPage.

Added PdfPage::links() -> &PdfPageLinks and PdfPage::links_mut() -> &mut PdfPageLinks as part of #68.

Completing implementation of reading form fields in #76 allows us to proceed with crate version 0.8.0.

Implemented split of PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages. Reworked internal handle() functions to return direct FPDF_* objects rather than object references, to improve cache locality on modern CPUs. Tested and updated all examples. Updated documentation and README.md.

Early experiments with splitting PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPages::get_mut() -> &mut PdfPage suggest that the management of lifetimes may be intractable. A hashmap can be used to store the FPDF_PAGE pointers and corresponding PdfPage instances, but returning a reference from that hashmap - even an immutable reference - by using interior mutability in PdfPages::get() is not possible because the Ref<> guard wrapping the item returned from the hashmap is a temporary value that is immediately dropped when returning from PdfPage::get().

(Interior mutability in PdfPage seems necessary because we certainly don't want PdfPage::get() to require &mut self. Alternatively, we might be able to use a separate PdfPageCache, a la PdfPageIndexCache, but that seems like a lot of additional complexity.)

Clearly, splitting PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPage::get_mut() -> &mut PdfPage is a considerably more complex piece of work than splitting PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages. Even if splitting PdfPages::get() is possible, it's a lot of change to package as a single release. Let's release the PdfDocument::pages() change as 0.8.0, and assess separately whether splitting PdfPages::get() is worth doing in a separate release.

Released as crate version 0.8.0.