/bfg-workshop

A sandbox to learn about BFG tool and practice how to rewrite the commit history.

Primary LanguageHack

BFG Workshop

With this workshop, you can get your hands-on experience using the BFG Repo-Cleaner tool.

The purpose of this repository is to simulate a real-life project in which sensitive information has been exposed to a public Github repository. In order to clean it up (besides removing the sensitive data with a new commit) the git history also needs to be rewritten, so that the exposed data is also removed from previous commits.

By following the steps below, you will learn how to deal with this situation and have some experience for future events.

Before you start

  1. Install BFG

    • With Homebrew: brew install bfg
    • For M1 chip: arch -arm64 brew install bfg
    • or, manually install it by downloading the .jar from their website. See Additional Information below for more information.
  2. Fork this repository to your personal Github account so you can have your own sandbox to poke around.

  3. Clone your own repo, not this one.

The current state of the project

There's an .env file that has been commited, exposing sensitive credentials to a production database. This file needs to be completely removed from the git history.

Additionally, there's another file settings.json that contains an API key. This file should not be deleted, but that API key needs to be removed, and obfuscated on the git history.

_

Step 1: Check out the list of commits.

Give yourself an idea of what has been commited by visiting the URL /bfg-workshop/commits/trunk on your Github sandbox repo.

You will notice a few commits that start with the text "Whoops!". Those files are the ones we are going to work with.

Commits: 9a6e41b, 44809a7

_

Step 2: Delete file.

In this step we'll focus on removing the .env file we mistakenly committed to the branch along with it's commit history.

First remove the file from your local repo and commit and push the changes to the remote repo.

  • Do this with git rm .env, then commit and push as normal. After the push, you'll notice the .env file is no longer in the repo on Github, however it and it's contents are still visible in some of the previous commits. Let's remove that history.

Remove the commit history:

  • Use the following command: bfg --delete-files .env. The command will run, providing details of the clean-up operation, followed by the statement BFG run is complete! When ready, run: git reflog....
  • Copy the command from the terminal and run it. (For reference, the command should be git reflog expire --expire=now --all && git gc --prune=now --aggressive).
  • Now force a push to the remote repository using git push --force.

The file should now be removed from all commit history both locally, and in the remote repository. See Step 4 below for additional important information.

Step 3: Obfuscate text.

Now lets focus on how to remove a string from the settings.json file without actually removing the whole file from the history.

  • First remove the offending text in the settings.json file by removing the value from the string, leaving an empty string is fine. Commit and push the change to the remote repository.
  • Create a file called replacements.txt in the git root directory and populate it with the text that you want to obfuscate, one string per line. In our case, replacements.txt will only contain one single line:
07dba36e-0506-4230-ba5b-4e2fa87c546d==>0000

This replaces the API string with 0000

  • From the Terminal and in the git root directory, run bfg --replace-text replacements.txt -fi '*.json'. If you're in a different folder use the format bfg --replace-text ~/path/to/replacements.txt -fi '*.json' (replace ~/path/to/ with the actual path).
  • On screen, check the files that are going to be changed. Make sure that only the files that you want will be modified.
  • Do a git reflow ... by following the command provided in your Terminal, the same as in Step 2 above.
  • Execute git push -f (same as git push --force).

The API key value should now be removed from all commit history both locally and in the remote repository.

Step 4: Avoid pushing from old branches.

  • In a real-world scenario, you should let the team know that the scrubbing process is done and they should delete their local branches and re-clone/pull from the cleaned remote repo. Failure to do this could cause the removed history to be re-introduced.
  • If the leaked content was only on a single branch and you're ok to lose any local changes, or don't have any, then you can cleanly remove a local branch using the command git branch -D <branch name>.
  • If on the other hand you have un-merged changes you don't want to lose, then you can rebase instead. Warning: Do not use a merge for this operation as it may reintroduce the removed history. Generally, from your base branch, say trunk, git pull, switch to your working branch git checkout fix/something, and rebase with git rebase trunk.

Additional Information.

  • While the content/file is removed, linking directly to the original commit (eg github.com/username/repo-name/commit/652ac....194c) will still show the content or file even though the commit is no longer included in our repo history, and anyone forking or cloning the repo won’t have this history or any reference to it. It is for this reason that any secrets that are committed to Github should be considered compromised, even if removed.
  • To completely remove the content from Github after cleaning with BFG you need to contact Github. Reference.
  • Merged Branches - In a real-world scenario if you're removing files or text from a branch that's already been merged into another branch, you'll want to make sure you push your initial changes to those branches as well, before running the BFG commands. For example, you're removing a leaked file in the branch fix/something, and this was already merged to trunk before discovering the leak, you'll need to also merge the fix/something into trunk after your git rm... and commit/push. Alternatively, you can remove/make changes to each branch separately, then run BFG. When pushing the final history changes, you can use git push --all --force. This will force push the change to all branches.
  • In case you are not running brew, or using another operating system, you will need to manually download and run the .jar BFG file. This will require Java to be installed if not already so, and is out of scope for this workshop. However, you might find some of the responses in this Github issue thread helpful.
  • After running BFG you'll notice in your local repo a folder with the name format <repo-name>.bfg-report. It contains a somewhat criptic log of actions performed by BFG on the local repository. It's safe to remove this folder when you're finished using the BFG.

Resources

bfg 1.14.0
Usage: bfg [options] [<repo>]

  -b, --strip-blobs-bigger-than <size>
  -B, --strip-biggest-blobs NUM
  -bi, --strip-blobs-with-ids <blob-ids-file>
  -D, --delete-files <glob>
  --delete-folders <glob>
  --convert-to-git-lfs <value>
  -rt, --replace-text <expressions-file>
  -fi, --filter-content-including <glob>
  -fe, --filter-content-excluding <glob>
  -fs, --filter-content-size-threshold <size>
  -p, --protect-blobs-from <refs>
  --no-blob-protection
  --private
  --massive-non-file-objects-sized-up-to <size>

https://repository.sonatype.org/service/local/repositories/central-proxy/content/com/madgag/bfg/1.14.0/bfg-1.14.0.txt