/github-bucket

Deploy public or private GitHub repositories automatically to S3 buckets using Git and deploy keys.

Primary LanguageJavaISC LicenseISC

github-bucket

Deploy every GitHub repository securely with Git to your S3 bucket and keep it automatically up to date on push. Specify a branch, which will be unpacked to S3. You might publish a static website to S3 and make it globally available with CloudFront.

Take full advantage of GitHub deploy keys, which can be setup per repository. Finally, pay only per use as this project uses Lambda.

Amazon Web Services

  • IAM: Identity and Access Management.
  • SNS: Used as Message Queue to trigger the Lambda function.
  • Lambda: Used to sync the GitHub repository with S3.
  • S3: Used as data store and file backend.
  • CloudWatch: Used for viewing logs from SNS and Lambda.
  • (Optional) CloudFront: Used as static website service as it has lower pricing as S3 (Guide).
  • (Optional) Route 53: Route your DNS requests of the custom domain to the closest CloudFront edge server (Guide).

Quickstart

Your Amazon Web Services should use the same region! Take also a look at Dynamic GitHub Actions with AWS Lambda to get started with the Lambda deployment options.

Architecture

SNS

Create a new topic and copy the topic ARN.

IAM

Create a user for GitHub. Go to the AWS console, switch to IAM and create a user with following permissions (replace $ARN with the SNS ARN):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sns:Publish"
            ],
            "Sid": "Stmt0000000000000",
            "Resource": [
                "$ARN"
            ],
            "Effect": "Allow"
        }
    ]
}

S3

Create a new bucket (or choose a existing empty bucket). That's it!

Lambda

Download the latest release from the release section. Create a new blank Lambda function. During creation also add the created SNS topic as trigger for your function. Name the function as you like, e.g. StaticSiteDeployer and choose the runtime Java 8 (or higher). Following environment variables should be changed:

  • env_branch: The branch to watch, default master.
  • env_bucket: The bucket to push to, e.g. baxterthehacker.
  • env_github: The GitHub repository to pull from, e.g. baxterthehacker/public-repo.git.

You can also change the environment variables from the AWS console afterwards.

The handler class for Lambda must be configured to: net.berla.aws.Lambda. The memory size should at least be 192 MB and can be increased on OutOfMemory-Exceptions. Adjust the timeout to at least one minute. The role must be configured with following permissions (replace baxterthehacker with your bucket).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::baxterthehacker"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::baxterthehacker/*"
        }
    ]
}

Note

Java isn't as good as other languages in cold start. It takes a few seconds for the application to boot (~5s). After that, the repository is initially cloned, which may take longer than one minute (depends on repository size). In this case you can increase the lambda timeout or upload the initial working tree by yourself on the first run. All further uploads will be checked file by file against their MD5 checksum.

Files are currently processed inside memory. If you have large files stored inside your repository, you will need to increase the memory size even further. It is planned to process large files inside the lambda temp directory, to save some memory. But this isn't implemented yet.

GitHub

Create a deploy key and answer the questions after submitting the command:

ssh-keygen -t rsa -b 4096

Default location for this is: ~/.ssh/id_rsa

Get the host key of GitHub for security reasons:

ssh-keyscan -t rsa github.com > known_hosts

Place the generated key as .ssh/id_rsa and the known hosts file as .ssh/known_hosts in the S3 bucket.

Switch to the settings of you GitHub-Repository and add the deploy key as readonly key. Go to the Integrations & services section and add Amazon SNS.

Enter your AWS access token for this integration. Please remember, that you should restrict the user rights as much as possible in IAM. Also enter the ARN of the SNS topic and the region of the SNS topic.

Logs

Use CloudWatch for this. You can for example view the SNS request and the Lambda process logs.

Testing

You can start a local debugging session and start the main method in net.berla.aws.Worker.

You can configure the test runtime by changing the environment properties or by setting the already mentioned system variables.

Building

If you want to build from source, then just trigger Maven with mvn clean package from inside the project root directory. The JAR will be created as target/github-bucket-*.jar.

How does it work?

GitHub triggers SNS after one of these events. SNS triggers the Lambda function, which will check for Push-Events on the configured branch. If the branch matches, it will check the S3 bucket for the current state and update it with the GitHub state. The changes will be applied and pushed back to S3.

Why Java?

Lambda officially supports just a few technologies and at the time of writing these were Node.js, Python, C# and Java. There are some workarounds for other languages, like my personal preference Go, but as we do not control the underlying system this could fail in the future.

Furthermore there is a great implementation of Git completely written in Java by the Eclipse Project which is called jgit. This allowed me to use the much better deployment keys instead of the plain GitHub API, which also has less features and requires you to add a personal access token for your whole account and not just the repository. Last but not least the deployment key can be readonly.

Credits

This project was created by @berlam.

License

See LICENSE.