PostHog/posthog-php

Unexpected behaviour with "maximumBackOffDuration" in HttpClient

Opened this issue · 4 comments

In this snippet of code in the HttpClient.php

$backoff = 200;

while ($backoff < $this->maximumBackoffDuration) {
            ...

            // retry failed requests just once to diminish impact on performance
            $httpResponse = $this->executePost($ch);
            $responseCode = $httpResponse->getResponseCode();

            //close connection
            curl_close($ch);

            if (200 != $responseCode) {
                // log error
                $this->handleError($ch, $responseCode);

                if (($responseCode >= 500 && $responseCode <= 600) || 429 == $responseCode) {
                    // If status code is greater than 500 and less than 600, it indicates server error
                    // Error code 429 indicates rate limited.
                    // Retry uploading in these cases.
                    usleep($backoff * 1000);
                    $backoff *= 2;
                } elseif ($responseCode >= 400) {
                    break;
                } elseif ($responseCode == 0) {
                    break;
                }
            } else {
                break;  // no error
            }
        }

If a request fails for a 50* or 429 status code, the code will attempt to retry the request (and adds a sleep in the process).

However I think there is a fundamental logic bug in this process. Pointing at this area of code:

usleep($backoff * 1000);
$backoff *= 2;

Let's assume we have an API that constantly returns a 503 status code.

The first time it runs, we'd sleep for 0.2 seconds, then double $backoff to 400
The second time it runs, we'd sleep for 0.4 seconds, then double $backoff to 800
The third time it runs, we'd sleep for 0.8 seconds, then double $backoff to 1600
The fourth time it runs, we'd sleep for 1.6 seconds, then double $backoff to 3200
The fifth time it runs, we'd sleep for 3.2 seconds, then double $backoff to 6400
The sixth time it runs, we'd sleep for 6.4 seconds, then double $backoff to 12800

The loop would then bail out, based on the default $maximumBackoffDuration of 10000.

This means we have in taken 12.6 seconds (sleep time), plus the request time before the script bails - in a synchronous fashion, blocking any other script in the stack in the meantime.

I propose a solution which deprecates the maxiumumBackoffDuration parameter, and instead uses an integer for maxRetryCount, with a retrySleep parameter to give users the option of a sleep inbetween each retry.

Alternatively, we could strip out the retry logic altogether, and pass the responsibility of this to the user (we return a HttpResponse object, so users could determine the responseCode from this and retry themselves if needed.

Opinions on this?

@yakkomajuri you seem to be the most active PostHog-er on this repo, do you have any opinions on the above?

And for additional context on this, I dug into it after a deploy caused our stack to show 503 errors, and every PHP feature switch request took over 15 seconds causing user complaints.

@joesaunderson indeed this seems like it was written in a way that is not very respectful of PHPs single-threaded nature.

I do think your solution makes sense. Would you be down to put up a PR?

eexit commented

See #33