jorisschellekens/borb

BUG Inverted characters

nicwest opened this issue · 2 comments

Describe the bug
When adding text to an existing PDF the characters are inverted.

To Reproduce

  1. download this PDF https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1145516/sa800man_2023.pdf
  2. rename to SA800.pdf
  3. put it in the same directory as the following script
  4. run the script
from borb.pdf import Paragraph
from borb.pdf import PDF
from borb.pdf.canvas.geometry.rectangle import Rectangle
from decimal import Decimal


def main():
    with open("SA800.pdf", "rb") as in_file_handle:
        doc = PDF.loads(in_file_handle)

    page = doc.get_page(0)

    r = Rectangle(Decimal(0), Decimal(0), Decimal(200), Decimal(200))
    Paragraph("Hello World!", font="Courier").paint(page, r)

    with open("output.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)


if __name__ == "__main__":
    main()

Expected behaviour
I would expect this to render "Hello World!" somewhere near the top of the page

Screenshots

Screenshot 2023-05-12 at 13 19 42

Desktop (please complete the following information):

  • OS: mac
  • borb version 2.1.12
  • input PDF: see steps to reproduce

Additional context
The coordinates of the rectangle aren't behaving as I would expect either, increasing the width seemingly makes the rectangle taller, and visa versa.

I have a correction. Using the above script renders text correctly. Noteably downloading the PDF from the original source seems to make a difference. I was running the script originally with this PDF which produced the errors:
SA800.pdf

The content of a PDF is located in a so called content stream.

These streams are essentially compressed pieces of code (in a language called postfix) that tell the viewing software how to render content.

In pseudo-code you might find something such as:

  • go to coordinate 80, 120
  • set the font color to black
  • set the font to Helvetica, size 12
  • render the character "H"
  • etc

As you can tell from the pseudo-code, the renderer has a state (coordinates, colors, active font, etc).

There are operators that modify this state even further. For instance you can apply a matrix transformation to the coordinate system.

Normally, you would encapsulate content-rendering operations with a q and Q respectively. These operators tell the viewer to store the graphic state and restore the graphics state.

Your issue might be something like:

  • the PDF already contains a matrix transform
  • the PDF does not restore the state
  • borb appends content, expecting the state to be the default, leading to the wrong output