BUG Inverted characters
nicwest opened this issue · 2 comments
Describe the bug
When adding text to an existing PDF the characters are inverted.
To Reproduce
- download this PDF https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1145516/sa800man_2023.pdf
- rename to SA800.pdf
- put it in the same directory as the following script
- run the script
from borb.pdf import Paragraph
from borb.pdf import PDF
from borb.pdf.canvas.geometry.rectangle import Rectangle
from decimal import Decimal
def main():
with open("SA800.pdf", "rb") as in_file_handle:
doc = PDF.loads(in_file_handle)
page = doc.get_page(0)
r = Rectangle(Decimal(0), Decimal(0), Decimal(200), Decimal(200))
Paragraph("Hello World!", font="Courier").paint(page, r)
with open("output.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, doc)
if __name__ == "__main__":
main()
Expected behaviour
I would expect this to render "Hello World!" somewhere near the top of the page
Screenshots
Desktop (please complete the following information):
- OS: mac
- borb version 2.1.12
- input PDF: see steps to reproduce
Additional context
The coordinates of the rectangle aren't behaving as I would expect either, increasing the width seemingly makes the rectangle taller, and visa versa.
I have a correction. Using the above script renders text correctly. Noteably downloading the PDF from the original source seems to make a difference. I was running the script originally with this PDF which produced the errors:
SA800.pdf
The content of a PDF is located in a so called content stream.
These streams are essentially compressed pieces of code (in a language called postfix) that tell the viewing software how to render content.
In pseudo-code you might find something such as:
- go to coordinate 80, 120
- set the font color to black
- set the font to Helvetica, size 12
- render the character "H"
- etc
As you can tell from the pseudo-code, the renderer has a state (coordinates, colors, active font, etc).
There are operators that modify this state even further. For instance you can apply a matrix transformation to the coordinate system.
Normally, you would encapsulate content-rendering operations with a q
and Q
respectively. These operators tell the viewer to store the graphic state and restore the graphics state.
Your issue might be something like:
- the PDF already contains a matrix transform
- the PDF does not restore the state
- borb appends content, expecting the state to be the default, leading to the wrong output