microsoft/playwright-python

[Bug]: Potential data loss in PipeTransport.send due to missing drain() call

mohasarc opened this issue · 2 comments

Version

v1.48.0

Steps to reproduce

Description:

I’ve identified a potential issue in the PipeTransport class in the Playwright Python library, specifically within the send() method. The current implementation writes data to _output (an asyncio.StreamWriter) but does not call drain() afterward, which is required to ensure that data is properly flushed to the stream.

def send(self, message: Dict) -> None:
assert self._output
data = self.serialize_message(message)
self._output.write(
len(data).to_bytes(4, byteorder="little", signed=False) + data
)

Background

The Python asyncio StreamWriter documentation states:

The method should be used along with the drain() method.

This ensures that the write buffer does not exceed its high watermark and that the data is reliably transmitted:

Wait until it is appropriate to resume writing to the stream.

Without calling drain(), there’s a risk that data written to the subprocess may remain in the internal buffer and not be sent, potentially leading to data loss or communication issues.

Potential Solution

To resolve this, the send() method could be made asynchronous so that it can await drain(). But doing so would require updates to any code that uses send() to handle it asynchronously. Alternatively, a more thorough design change might be needed to ensure proper flow control without blocking.

Expected behavior

The send() method should ensure that data is properly flushed to the subprocess by incorporating appropriate flow control mechanisms, such as await writer.drain(), while avoiding deadlocks.

Actual behavior

Messages could get corrupted while being transported to the subprocess

Additional context

According to the asyncio subprocess documentation:

Use the communicate() method rather than process.stdin.write()… to avoid deadlocks due to streams pausing reading or writing.

However, communicate() is designed for a one-time exchange with the subprocess, which doesn’t apply to continuous communication scenarios like in Playwright’s use case.

Environment

  • Operating System: [macOS 15.0]
  • CPU: [arm64]
  • Browser: [All, Chromium, Firefox, WebKit]
  • Python Version: [3.12]

While it might be an issue we have not heard any bug reports about it. Usually we only act on bugs if we have a repro.

I'll close it as per above since its only a theoretical bug which was working fine so far - as soon as we have a specific scenario / reproduction we're more than happy to invest more resources to it / investigate. Thanks!