tableau/server-client-python

PDF size from populated views too large - help wanted/bug

Opened this issue · 3 comments

Some background:

Goal here is to download multiple filtered views as a pdf, ideally this should be done directly at a workbook level, but the filters don't work across the workbook.

Have browsed various suggested methods here and on the internet and the suggested approach was to filter each view seperately, download filtered view pdfs separately and merge them.

So approach is populate each view with applied filters, append them into a bytesIO stream and write to pdf.

Issue is this approach generates large pdfs from the views ~200KB resulting in an approx 1.8 MB pdf file (5 pages), alternatively doing so directly on the Tableau dashboard I can download multiple views (technically entire dashboard) in a smaller pdf ~ 917KB !

Code:

import pypdf as pdf
from io import BytesIO

pdflist=[]

for i in range(len(views)):
    server.views.populate_pdf(views[i],pdf_req_option)

writer = pdf.PdfWriter()

for j in range(len(pdflist)):
    
    stream = BytesIO(pdflist[j].pdf)
    writer.append(stream)

#---Compress and merge pdf
  
writer.write("filename.pdf")

In the upcoming 0.31 release you will be able to specify viz width and height when retrieving PDFs. #1348

This should allow you to more precisely control your output size and give an apples to apples comparison.

Thanks, well print to pdf paper type used is letter in both cases, even tried with type unspecified, so not sure how width/height is going to help here

We will fix the server issue of not filtering a workbook and have it available this year.

For a more immediate way to improve your exported pdfs, there are a lot of suggestions in this thread about compressing or optimizing pdfs including several python libraries.
https://stackoverflow.com/questions/59614014/pypdf4-exported-pdf-file-size-too-big