/shepherd

A utility for applying code changes across many repositories.

Primary LanguageTypeScriptApache License 2.0Apache-2.0

Shepherd

Illustration of a sheep

GitHub Workflow Status semantic-release: conventionalcommits npm version GitHub issues

Shepherd is a utility for applying code changes across many repositories.

  • Powerful: You can write migration scripts using your favorite Unix commands, tools like jscodeshift, or scripts in your preferred programming language.
  • Easy: With just a few commands, you can checkout dozens of repositories, apply changes, commit those changes, and open pull requests with detailed messages.
  • Flexible: Ships with support for Git/GitHub, but can easily be extended to work with other version control products like Bitbucket, GitLab, or SVN.

For more high level context, this blog post covers the basics.

Getting started

Install the Shepherd CLI:

npm install -g @nerdwallet/shepherd

If using GitHub Enterprise, ensure the following environment variables are exported:

export SHEPHERD_GITHUB_ENTERPRISE_BASE_URL={company_github_enterprise_base_url} # e.g., github.com
export SHEPHERD_GITHUB_ENTERPRISE_URL={company_github_enterprise_url} # e.g., api.github.com/api/v3

If using ssh, ensure that your GITHUB_TOKEN is exported:

export GITHUB_TOKEN=<PAT>

Shepherd will now be available as the shepherd command in your shell:

shepherd --help
Usage: shepherd [options] [command]
...

Take a look at the tutorial for a detailed walkthrough of what Shepherd does and how it works, or read on for a higher-level and more brief look!

Go to tutorial →

Motivation for using Shepherd

Moving away from monorepos and monolithic applications has generally been a good thing for developers because it allows them to move quickly and independently from each other. However, it's easy to run into problems, especially if your code relies on shared libraries. Specifically, making a change to shared code and then trying to roll that shared code out to all consumers of that code becomes difficult:

  • The person updating that library must communicate the change to consumers of the library
  • The consumer must understand the change and how they have to update their own code
  • The consumer must make the necessary changes in their own code
  • The consumer must test, merge, and deploy those changes

Shepherd aims to help shift responsibility for the first three steps to the person actually making the change to the library. Since they have the best understanding of their change, they can write a code migration to automate that change and then user Shepherd to automate the process of applying that change to all relevant repos. Then the owners of the affected repos (who have the best understanding of their own code) can review and merge the changes. This process is especially efficient for teams who rely on continuous integration: automated tests can help repository owners have confidence that the code changes are working as expected.

Migration Configuration Schema

Example

A migration is declaratively specified with a shepherd.yml file called a spec. Here's an example of a migration spec that renames .eslintrc to .eslintrc.json in all NerdWallet repositories that have been modified in 2018:

id: 2018.07.16-eslintrc-json
title: Rename all .eslintrc files to .eslintrc.json
adapter:
  type: github
  search_type: code
  search_query: org:NerdWallet path:/ filename:.eslintrc
hooks:
  should_migrate:
    - ls .eslintrc # Check that this file actually exists in the repo
    - git log -1 --format=%cd | grep 2018 --silent # Only migrate things that have seen commits in 2018
  post_checkout: npm install
  apply: mv .eslintrc .eslintrc.json
  pr_message: echo 'Hey! This PR renames `.eslintrc` to `.eslintrc.json`'

Fields

  • id:

    • Description: Specifies a unique identifier for this migration.
    • Usage: Used as a branch name for the migration and internally by Shepherd to track migration state.
  • title:

    • Description: A title for the migration.
    • Usage: Used as the commit message.
  • adapter:

    • Description: Specifies the version control adapter for repo operations.
    • Details: Currently supports only GitHub, but can be extended for Bitbucket or GitLab. Configuration differs based on the adapter.
      • GitHub Specific:
        • search_query: Utilizes GitHub's code search qualifiers to identify candidate repositories.
        • org: Specify YOURGITHUBORGANIZATION to consider every visible repo in the organization.
        • search_type (optional): Either 'code' or 'repositories'. Defaults to code search. For repositories, use Github repository search.

Hooks

Hooks define the core functionality of a migration in Shepherd.

  • should_migrate:

    • Description: Commands to determine if a repo requires migration.
    • Behavior: Non-zero exit values indicate the repo should not be migrated.
  • post_checkout:

    • Description: Commands executed after a repo passes should_migrate checks.
    • Usage: Ideal for one-time setup actions per repo, like installing dependencies.
  • apply:

    • Description: Commands that perform the actual migration.
    • Note: This can range from simple to complex sequences, depending on migration needs.
  • pr_message:

    • Description: Commands to generate a pull request message.
    • Output: Anything written to stdout is used for the message. Multiple commands will have their outputs concatenated.

Requirements

  • Optional: should_migrate, post_checkout
  • Required: apply, pr_message

Environment Variables

Shepherd exposes some context to each command via specific environment variables. Some additional enviornment variables are exposed when using the git or github adapters.

Environment Variable Default Description
SHEPHERD_REPO_DIR ~/.shepherd the absolute path to the repository being operated on. This will be the working directory when commands are executed.
SHEPHERD_DATA_DIR ~/.shepherd the absolute path to a special directory that can be used to persist state between steps. This would be useful if, for instance, a jscodeshift codemod in your apply hook generates a list of files that need human attention and you want to use that list in your pr_message hook.
SHEPHERD_BASE_BRANCH default branch the name of the branch Shepherd will set up a pull-request against. This will often, but not always, be main. Only available for apply and later steps.
SHEPHERD_MIGRATION_DIR path to migration spec the absolute path to the directory containing your migration's shepherd.yml file. This is useful if you want to include a script with your migration spec and need to reference that command in a hook. For instance, if you have a script pr.sh that will generate a PR message: my pr_message hook might look something like this: pr_message: $SHEPHERD_MIGRATION_DIR/pr.sh
SHEPHERD_GIT_REVISION (git and github adapters) is the current revision of the repository being operated on.
SHEPHERD_GITHUB_REPO_OWNER (github adapter) is the owner of the repository being operated on. For example, if operating on the repository https://github.com/NerdWalletOSS/shepherd, this would be NerdWalletOSS.
SHEPHERD_GITHUB_REPO_NAME (github adapter) is the name of the repository being operated on. For example, if operating on the repository https://github.com/NerdWalletOSS/shepherd, this would be shepherd.
SHEPHERD_GITHUB_ENTERPRISE_URL api.github.com For GitHub Enterprise, export this variable containing the company's GitHub Enterprise url (e.g., api.github.com/api/v3).
SHEPHERD_GITHUB_ENTERPRISE_BASE_URL api.github.com For GitHub Enterprise, export this variable contraining the company's GitHub Enterprise base url.
SHEPHERD_GITHUB_PROTOCOL ssh (github adapter) is the protocol to use when cloning repos. Can be https or ssh. https is useful when ssh is firewalled.

Additional Context:

When working with GitHub, the API endpoints differ based on whether you are using GitHub's public cloud service, GitHub Enterprise Cloud, or GitHub Enterprise Server (on-premises). Here’s how the endpoints vary:

In each case, the API functionality and how you interact with it are largely the same, but the base URL changes based on where your GitHub instance is hosted.

As a result, the value of SHEPHERD_GITHUB_ENTERPRISE_URL is a function of the type of GitHub service and used to support git APIs except cloning which is configurable via SHEPHERD_GITHUB_ENTERPRISE_BASE_URL. For SHEPHERD_GITHUB_ENTERPRISE_BASE_URL, while github.com works across GitHub service types, for backwards compatibility, we default to api.github.com.

Usage

Shepherd is run as follows:

shepherd <command> <migration> [options]

<migration> is the path to your migration directory containing a shepherd.yml file.

There are a number of commands that must be run to execute a migration:

  • checkout: Determines which repositories are candidates for migration and clones or updates the repositories on your machine. Clones are "shallow", containing no git history. Uses should_migrate to decide if a repository should be kept after it's checked out.
  • apply: Performs the migration using the apply hook discussed above.
  • commit: Makes a commit with any changes that were made during the apply step, including adding newly-created files. The migration's title will be prepended with [shepherd] and used as the commit message.
  • push: Pushes all commits to their respective repositories.
  • pr-preview: Prints the commit message that would be used for each repository without actually creating a PR; uses the pr_message hook.
  • pr: Creates a PR for each repo with the message generated from the pr_message hook.
  • version: Prints Shepherd version

By default, checkout will use the adapter to figure out which repositories to check out, and the remaining commands will operate on all checked-out repos. To only checkout a specific repo or to operate on only a subset of the checked-out repos, you can use the --repos flag, which specifies a comma-separated list of repos:

shepherd checkout path/to/migration --repos facebook/react,google/protobuf

Run shepherd --help to see all available commands and descriptions for each one.

Developing

Run npm install to install dependencies.

Shepherd is written in TypeScript, which requires compilation to JavaScript. When developing Shepherd, it's recommended to run npm run build:watch in a separate terminal. This will incrementally compile the source code as you edit it. You can then invoke the Shepherd CLI by referencing the absolute path to the compiled cli.js file:

cd ../my-other-project
../shepherd/lib/cli.js checkout path/to/migration

Shepherd currently has minimal test coverage, but we're aiming to improve that with each new PR. Tests are written with Jest and should be named in a *.test.ts alongside the file under test. To run the test suite, run npm run test.

We use ESLint to ensure a consistent coding style and to help prevent certain classes of problems. Run npm run lint to run the linter, and npm run fix-lint to automatically fix applicable problems.

Credits

  1. Logo designed by Christopher Wharton.