cloudflare/puppeteer

[Bug]: Puppeteer PDF generation via Page.pdf() method

Closed this issue · 5 comments

Bug description

I was creating a simple html to pdf converter when I stumbled into a little issue with all the current version of the fork.

In the current implementation of the Page.pdf method in the @cloudflare/puppeteer fork, the Readable stream returned by Page.createPDFStream is converted to a Buffer by the function getReadableAsBuffer in puppeteer src/common /util.ts, where the problem is at. The getReadableAsBuffer function tries to iterate over a non iterable object(node:stream/Readable), what causes an TypeError: readable is not async iterable exception.

It can be easily solved by using Page.createPDFStream directly, but is still an issue which is not present in the @puppeteer/puppeteer-core package.

Steps to reproduce the problem:

  1. Launch puppeteer and instantiate a Page with some content:
const browser = await puppeteer.launch(env.MYBROWSER);
const page = await browser.newPage();
await page.setContent('<h1>HELLO CLOUDFLARE</h1>', {
waitUntil: 'networkidle0',
});
  1. Get the PDF:
 await page.pdf({ displayHeaderFooter: true })

Below is the link of a repo to reproduce the issue.
https://github.com/GiovaniMFMurari/cf-puppeteer-pdf-gen-test

Puppeteer version

0.0.6

Node.js version

16, 18 and 20

npm version

using pnpm 8.15.5

What operating system are you seeing the problem on?

Linux

Relevant log output

✘ [ERROR] Uncaught (in response) TypeError: readable is not async iterable

      at getReadableAsBuffer
      at Page.pdf
      at async Object.fetch
      at async drainBody

Getting this too! Any update?

UPDATE: here's a working workaround. createPDFStream() seems to work, which you can then collect into a buffer.

export default async function markdownToR2Pdf(env: Env) {
	const settings = await getSettings(env);

	// Define test HTML
	let html = `
    <html>
        <body>
            <h1>Hello World</h1>
        </body>
    </html>
    `;

	// Launch browser
	const browser = await puppeteer.launch(env.BROWSER);

	// Create new page
	const page = await browser.newPage();

	// Set page content
	await page.setContent(html);

	// Wait for network idle
	await page.waitForNetworkIdle();

	// Generate PDF
	const pdfStream = await page.createPDFStream({
		format: "A4",
		printBackground: true,
	});

	// Collect PDF data into a buffer
	const chunks: Uint8Array[] = [];
	return new Promise<string>((resolve, reject) => {
		pdfStream.on("data", (chunk: Uint8Array) => chunks.push(chunk));
		pdfStream.on("end", async () => {
			const pdfBuffer = Buffer.concat(chunks);
			console.log("PDF buffer created");

			// Upload PDF to R2 bucket
			const objectName = `pdf_${Date.now()}.pdf`;
			await env.SALES_FILES_BUCKET.put(objectName, pdfBuffer);
			console.log("PDF uploaded to R2 bucket");

			await page.close();
			await browser.close();

			const pdfUrl = `${settings.salesFilesBucketUrl}/${objectName}`;

			resolve(pdfUrl);
		});
		pdfStream.on("error", (error) => {
			reject(error);
		});
	});
}

@GiovaniMFMurari and @emilthemaker we released an update recently and we can't reproduce this anymore, can you give it another try ?

We are closing this, feel free to reopen if you still get this error with the recent versions

dpeek commented

This bug still exists in 0.0.13, throwing the same error:

This works:

function streamToBuffer(stream: NodeJS.ReadableStream) {
  return new Promise<Buffer>((resolve, reject) => {
    const chunks: any[] = [];
    stream.on('data', (chunk) => chunks.push(chunk));
    stream.on('end', () => resolve(Buffer.concat(chunks)));
    stream.on('error', (err) => reject(err));
  });
}

const stream = await page.createPDFStream();
const buffer = await streamToBuffer(stream);

This fails with TypeError: readable is not async iterable

const buffer = await page.pdf();