Continuously sync a svn to git, in the cloud.
This repo contains the building blocks for an on-demand service to sync a Subversion repository to a Git repository on a continuous basis.
"On a continuous basis" means that rather than a one-time migration from svn to git, development will continue in svn and new changes will be mirrored into git.
Currently, the following things are hard-coded for the OmegaT project:
- Source and target repository URLs
- Trigger authentication (assumed to be Apache Allura, i.e. SourceForge)
- Svn authors update mechanism
- Various names of AWS entities and resources
TODO: Make all of the above configurable
First, per this article one must note that the git-svn clone must be kept as an artifact in order to maintain a consistent git history.
The principal players:
- A tarball of the git-svn clone that is consistent with the existing git mirror
- Lives in an S3 bucket
- Not included in this repo
- The Docker image:
- Runs on Amazon ECS (Fargate)
- Pulls the above tarball, updates it from svn, and pushes the result to git
- Tars up the repo again and pushes it to S3
- The Lambda function:
- Validates incoming webhook requests
- Triggers the ECS task
Glue:
- Amazon API Gateway provides a URL to give to SourceForge as the webhook target
- The API request invokes the lambda, which in turn issues a custom CloudWatch event
- The CloudWatch event triggers the ECS task
Note: The CloudWatch event isn't strictly necessary—the lambda could run the task directly, but that requires it to know a bunch of incidental things like the task's security group and subnet(s), which I wanted to keep out of the lambda.
- macOS (untested but might work on *nix)
- Docker
- awscli
- Admin access to the SourceForge project
- An AWS account
-
Run
make build
to build the Docker image -
Register the generated public key
id_rsa.pub
to the SourceForge CI user -
Run
make deploy
to push the Docker image to Amazon ECR -
Create an ECS task definition using the Docker image
- The task will need a role with read, write, and delete permission on the S3 bucket where the repo tarball lives
- You can actually stop here if you're satisfied with polling the repository: just set up a scheduled task. The rest of these steps are all just to allow on-demand triggering via webhook.
-
Create a Lambda function
OmegatGitSvnSyncFunction
- The lambda will need a role with permission to list ECS tasks and put events
-
Set up the Lambda function to be triggered by an API Gateway method
- The method should have no authorization, as we handle that separately in the lambda
-
Create a new webhook on the svn repo that hits the API created above
-
Put the webhook secret into
./lambda/secret
-
Run
make deploy-trigger
to push the lambda code -
Define a CloudWatch event rule with a pattern that matches on the event issued by the lambda, e.g.
{ "source": [ "omegat-git-svn-sync-lambda" ] }
and triggers the ECS task
TODO: Automate all this setup
- If two API calls are received in quick succession, multiple tasks may run: the second lambda invocation may not see any tasks running yet, so it will not skip running a new task
- It's not clear what happens when two tasks try to upload the same repo tarball to S3 concurrently