php-vcr/php-vcr

The JSON storage is slow and unnecessarily CPU-consuming

morozov opened this issue · 3 comments

I have a test where the VCR intercepts 3 HTTP requests totaling ~1MB. Given that the corresponding responses are cached, I’d expect the test to pass in fractions of a second but it takes about 6 seconds.

callgraph

The reason is that the JSON storage is implemented in a memory-efficient way and reads the stored JSON byte-by-byte using fgets():

while (false !== ($char = fgetc($this->handle))) {

The potential solutions that come to mind are:

  1. Keeping the current design, implement a read buffer and instead of matching characters one-by-one, use a regex-based approach (see doctrine/dbal#2495 for example).
  2. Reimplement the storage using a 3rd-party streaming JSON parser like salsify/jsonstreamingparser (will probably require bumping the minimal supported PHP version which is AFAIK already planned).

In addition to the performance problem, there is a problem with parsing stored JSON: the actual request/response bodies are parsed as JSON too that may create issues like the following:

public function testIterateStringWithCurlyBraces()
{
    $this->iterateAndTest(
        '[{"para": "}:->"}]',
        array(
            array('para' => '}:->'),
        )
    );
}

// 1) VCR\Storage\JsonTest::testIterateStringWithCurlyBraces
// Failed asserting that two arrays are equal.
// --- Expected
// +++ Actual
// @@ @@
//  Array (
// -    0 => Array (...)
// +    0 => null
//  )

I'm not positive but a JSON streaming format would probably be more efficient to parse since you could read the entire object with a single function call.

@matt-allan it would but it turns out to be overkill. We don't really need to parse the body at all. We need to make sure we parse the envelope properly. See the PR above.