ampproject/amp-toolbox-php

Performance issues on `inlineCss` Transformer

jeffochoa opened this issue · 5 comments

I wanted to start a discussion to see if is viable/posible to refactor the inlineCss transformer to improve the overall performance.

I found this while debugging some issues we get from time to time:

Amp render failure: Cannot inline the amp-runtime CSS in unspecified version into <style amp-runtime="" i-amphtml-version="012107092322000"></style>: AmpProject\Exception\FailedToGetFromRemoteUrl: Failed to fetch the contents from the URL 'https://cdn.ampproject.org/v0.css'

What happens is that every-time you call the inlineCss transformer, two external requests are made to fetch the current AMP Runtime Version and the CSS styles.

I'm wondering how often those two change? (Version and Css content), isn't that something that we could cache locally to having to make those two extra request each time?

See for reference https://github.com/ampproject/amp-toolbox-php/blob/main/src/Optimizer/Transformer/AmpRuntimeCss.php#L153

private function inlineCss(Element $ampRuntimeStyle, $version)
    {
        // Use version passed in via params if available, otherwise fetch the current prod version.
        if (! empty($version)) {
            $v0CssUrl = RuntimeVersion::appendRuntimeVersion(Amp::CACHE_HOST, $version) . '/' . self::V0_CSS;
        } else {
            $v0CssUrl = self::V0_CSS_URL;

            $options  = [
                RuntimeVersion::OPTION_CANARY => $this->configuration->get(AmpRuntimeCssConfiguration::CANARY)
            ];
            $version  = (new RuntimeVersion($this->remoteRequest))->currentVersion($options);
        }

        $ampRuntimeStyle->setAttribute(Attribute::I_AMPHTML_VERSION, $version);

        $styles = $this->configuration->get(AmpRuntimeCssConfiguration::STYLES);

        if (empty($styles)) {
            $response   = $this->remoteRequest->get($v0CssUrl);
            $statusCode = $response->getStatusCode();

            if ($statusCode < 200 || $statusCode >= 300) {
                return;
            }

            $styles = $response->getBody();
        }

        $ampRuntimeStyle->textContent = $styles;
    }
    ```
   
    

Yes, we can absolutely improve on that. I haven't noticed this yet because our default use case is the WordPress AMP plugin, where these requests are indeed cached.

The main issue we have to figure out here is how to do caching in a framework- and environment-independent way...

For reference, this is how remote requests are being routed in the AMP for WordPress plugin: https://github.com/ampproject/amp-wp/blob/6f1d2c26cba313cf70dcc36bdffee6dda78cecf1/src/AmpWpPlugin.php#L225-L232

As you can see it has local filesystem fallbacks in case the remote request fails, and the entire thing is wrapped in a caching decorator.

I guess I could add a workaround by providing a custom RemoteGetRequest class to the TransformationEngine instance.

A more elegant solution, maybe, could be adding the possibility of providing a sort of CacheStore object to the TransformationEngine::class, which could be like ArrayCacheStore::class by default (no cache), but then you can implement a CacheStore::class interface to provide your own store using whatever driver you want to use.

Maybe even having that as part of the config?

$transformationEngine = new TransformationEngine(
    new Configuration(['cache_store' => MyCustomCacheStore::class])
);

I guess I could add a workaround by providing a custom RemoteGetRequest class to the TransformationEngine instance.

Yes, that is currently the intended way of providing caching or other adaptations of the remote request interface, hence why there is an interface in the first place.

I don't think we need any particular cache interface, as that can easily be wrapped around the existing interface via a decorator.

What we need, though, is a default setup that works in a reasonable way when you don't provide a custom implementation. The default behavior right now just feels buggy in terms of how bad it is for performance.

I suggest we build something like we have in https://github.com/ampproject/amp-wp/blob/6f1d2c26cba313cf70dcc36bdffee6dda78cecf1/src/RemoteRequest/CachedRemoteGetRequest.php#L27, but based on a file in stored in sys_get_temp_dir() instead of in WordPress transients. The file creation date can then be used to check for expiry.

This TemporaryFileCachedRemoteGetRequest implementation can then be used by default to decorate the default CurlRemoteGetRequest to solve this performance issue.

It is not the most optimal solution for most environments, but it avoids the worst problem. You can still at any point use a more adapted custom solution for caching as needed.