/gradle-s3-build-cache

An AWS S3 Gradle build cache implementation

Primary LanguageJavaApache License 2.0Apache-2.0

AWS S3 Gradle build cache

Gradle Plugin Portal CI Status

This is a custom Gradle build cache implementation which uses AWS S3 to store the cache objects.

This plugin is a fork of myniva/gradle-s3-build-cache. Burrunan adds lots of performance, functional, and security features. See v1.0.0 release notes

Key improvements are (as of 2020-07-24):

  • Improved security: the plugin requires only GetObject permission, and it does not need ListObjects
  • Improved security: the plugin uses its own configuration properties rather than default AWS credentials by default
  • Faster cache population: cache entries are sent as File objects rather than buffering to memory
  • Faster cache retrieval: cache lookup performs one network request rather than two
  • maximumCachedObjectLength property to avoid unexpectedly large remote cache transfers
  • Elaborate details on cache performance
  • Cached items contain metadata (elapsed duration, task name, Gradle version) which help to analyze the cache contents

Compatibility

  • Version 1.0.0 - Gradle 4.1+

Use in your project

Please note that this plugin is not yet ready for production. Feedback though is very welcome. Please open an issue if you find a bug or have an idea for an improvement.

Apply plugin

The Gradle build cache needs to be configured on the Settings level. As a first step, add a dependency to the plugin to your settings.gradle file. Get the latest version from Gradle plugin portal.

plugins {
  id("com.github.burrunan.s3-build-cache") version "1.3"
}

Configuration

The AWS S3 build cache implementation has a few configuration options:

Configuration Key Description Mandatory Default Value
region The AWS region the S3 bucket is located in. yes
bucket The name of the AWS S3 bucket where cache objects should be stored. yes
prefix Prefix for the S3 entry names no cache/
maximumCachedObjectLength Maximum object size that can be stored and retrieved from the cache no 50'000'000
reducedRedundancy Whether or not to use reduced redundancy. no true
endpoint Alternative S3 compatible endpoint no
headers A map with HTTP headers to be added to each request (nulls are ignored). e.g. [ 'x-header-name': 'header-value' ] no
awsAccessKeyId The AWS access key id no getenv("S3_BUILD_CACHE_ACCESS_KEY_ID")
awsSecretKey The AWS secret key no getenv("S3_BUILD_CACHE_SECRET_KEY")
sessionToken The AWS sessionToken when you use temporal credentials no getenv("S3_BUILD_CACHE_SESSION_TOKEN")
lookupDefaultAwsCredentials Configures if DefaultAWSCredentialsProviderChain could be used to lookup credentials yes false
showStatistics Displays statistics on the remote cache performance Yes true
showStatisticsWhenImpactExceeds Specifies minimum duration to trigger printing the stats, milliseconds Yes 100
showStatisticsWhenSavingsExceeds Specifies minimum duration to trigger printing the stats, milliseconds Yes 100
showStatisticsWhenWasteExceeds Specifies minimum duration to trigger printing the stats, milliseconds Yes 100
showStatisticsWhenTransferExceeds Specifies minimum transfer size to trigger printing the stats, bytes Yes 1010241024

Note: if both awsAccessKeyId and awsSecretKey are nullOrBlank (null or whitespace only), then anonymous credentials are used.

The buildCache configuration block might look like this:

Groovy DSL

// This goes to settings.gradle

apply plugin: 'com.github.burrunan.s3-build-cache'

ext.isCiServer = System.getenv().containsKey("CI")

buildCache {
    local {
        // Local build cache is dangerous as it might produce inconsistent results
        // in case developer modifies files while the build is running
        enabled = false
    }
    remote(com.github.burrunan.s3cache.AwsS3BuildCache) {
        region = 'eu-west-1'
        bucket = 'your-bucket'
        prefix = 'cache/'
        push = isCiServer
        // Credentials will be taken from  S3_BUILD_CACHE_... environment variables
        // anonymous access will be used if environment variables are missing
    }
}

Kotlin DSL:

// This goes to settings.gradle.kts

plugins {
    id("com.github.burrunan.s3-build-cache") version "1.3"
}

val isCiServer = System.getenv().containsKey("CI")

buildCache {
    local {
        // Local build cache is dangerous as it might produce inconsistent results
        // in case developer modifies files while the build is running
        enabled = false
    }
    remote<com.github.burrunan.s3cache.AwsS3BuildCache> {
        region = "eu-west-1"
        bucket = "your-bucket"
        prefix = "cache/"
        push = isCiServer
    }
}

More details about configuring the Gradle build cache can be found in the official Gradle documentation.

S3 credentials

It is recommended you specify credentials that have limited access to S3 resources, that is why plugin retrieves credentials from S3_BUILD_CACHE_ACCESS_KEY_ID, S3_BUILD_CACHE_SECRET_KEY, and S3_BUILD_CACHE_SESSION_TOKEN environment variables.

If you want to use AWS default credentials DefaultAWSCredentialsProviderChain, then configure lookupDefaultAwsCredentials=true. Note: it will still try S3_BUILD_CACHE_ variables first.

S3 Bucket Permissions for cache population

Note: if you use a path prefix (e.g. build-cache), you might want to configure the permission to that subfolder only.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
          "s3:PutObject"
      ],
      "Resource": [
          "arn:aws:s3:::your-bucket/*"
      ]
    }
  ]
}

S3 Bucket Permissions for reading data from the cache

If you use a path prefix (e.g. build-cache), you might want to configure the permission to that subfolder only.

Note: if you don't have enough permissions to access the item, it will be treated as "cache miss".

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
          "s3:GetObject"
      ],
      "Resource": [
          "arn:aws:s3:::your-bucket/*"
      ]
    }
  ]
}

Measuring cache efficiency

It is important to measure the efficiency of the caching, otherwise it might happen the caching increases build time (e.g. it downloads too large artifacts over a slow network or cache misses are too frequent).

Luckily many cache items include the time it took to build the task, so when the item is loaded from the cache, the time saved can be estimated as original_task_elapsed_time - from_cache_task_elapsed_time.

The plugin prints cache statistics at the end of the build (you can disable it with showStatistics=false):

BUILD SUCCESSFUL in 6s
1 actionable task: 1 executed
S3 cache 232ms saved (242ms saved on hits, 10ms wasted on misses), reads: 2, hits: 1, elapsed: 91ms, processed: 477 B
S3 cache writes: 1, elapsed: 121ms, sent to cache: 472 B

S3 reads:

  • saved: 232ms – overall estimation of the remote cache impact on the build time Note: this estimation does not account for parallel task execution.
  • 242ms saved on hits – the estimated time saved by the remote build cache (original_task_elapsed_time - from_cache_task_elapsed_time). Note: this estimation does not account for parallel task execution. saved on hits means the cache helps, and wasted on hits means the estimated execution time would be less than the cache load time.
  • 10ms wasted on misses – the amount of time spent for cache misses
  • reads: 2 – number load requests to the remote cache
  • hits: 1 – number items loaded from the remote cache
  • elapsed: 233ms – total time spent on loading items from remote cache
  • processed: 0 B – number of bytes loaded from the remote cache

S3 writes:

  • cache writes: 1 – number of store requests to the remote cache
  • elapsed: 115ms – time spent uploading items to the remote cache
  • sent to cache: 472 B – number of bytes sent to the remote cache

S3 metadata

The stored cache entries might include metadata. The metadata helps to estimate cache efficiency, and it might be useful to analyze space consumption.

Key Type Sample Description
buildInvocationId String rpha3qmrzvbnxhmdlvukwcx7ru Build Id
identity String :example:test Task identifier
executionTime Long 189871 Task execution time, milliseconds
operatingSystem String Linux Operating system
gradleVersion String 6.3 Gradle version

Expiring cache entries

This plugin does not deal with expiring cache entries directly but relies on S3 object lifecycle management to do so. Cache entry expiration rules can be set on S3 buckets using AWS API or via AWS Management Console.

Contributing

Contributions are always welcome! If you'd like to contribute (and we hope you do) please open a pull request.

License

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.