With this workshop, you can get your hands-on experience using the BFG Repo-Cleaner tool.
The purpose of this repository is to simulate a real-life project in which sensitive information has been exposed to a public Github repository. In order to clean it up (besides removing the sensitive data with a new commit) the git history also needs to be rewritten, so that the exposed data is also removed from previous commits.
By following the steps below, you will learn how to deal with this situation and have some experience for future events.
-
Install BFG
- With Homebrew:
brew install bfg
- For M1 chip:
arch -arm64 brew install bfg
- or, manually install it by downloading the .jar from their website. See Additional Information below for more information.
- With Homebrew:
-
Fork this repository to your personal Github account so you can have your own sandbox to poke around.
-
Clone your own repo, not this one.
There's an .env
file that has been commited, exposing sensitive credentials to a production database. This file needs to be completely removed from the git history.
Additionally, there's another file settings.json
that contains an API key. This file should not be deleted, but that API key needs to be removed, and obfuscated on the git history.
_
Give yourself an idea of what has been commited by visiting the URL /bfg-workshop/commits/trunk
on your Github sandbox repo.
You will notice a few commits that start with the text "Whoops!". Those files are the ones we are going to work with.
_
In this step we'll focus on removing the .env file we mistakenly committed to the branch along with it's commit history.
First remove the file from your local repo and commit and push the changes to the remote repo.
- Do this with
git rm .env
, then commit and push as normal. After the push, you'll notice the .env file is no longer in the repo on Github, however it and it's contents are still visible in some of the previous commits. Let's remove that history.
Remove the commit history:
- Use the following command:
bfg --delete-files .env
. The command will run, providing details of the clean-up operation, followed by the statementBFG run is complete! When ready, run: git reflog...
. - Copy the command from the terminal and run it. (For reference, the command should be
git reflog expire --expire=now --all && git gc --prune=now --aggressive
). - Now force a push to the remote repository using
git push --force
.
The file should now be removed from all commit history both locally, and in the remote repository. See Step 4 below for additional important information.
–
Now lets focus on how to remove a string from the settings.json
file without actually removing the whole file from the history.
- First remove the offending text in the
settings.json
file by removing the value from the string, leaving an empty string is fine. Commit and push the change to the remote repository. - Create a file called
replacements.txt
in the git root directory and populate it with the text that you want to obfuscate, one string per line. In our case,replacements.txt
will only contain one single line:
07dba36e-0506-4230-ba5b-4e2fa87c546d==>0000
This replaces the API string with 0000
- From the Terminal and in the git root directory, run
bfg --replace-text replacements.txt -fi '*.json'
. If you're in a different folder use the formatbfg --replace-text ~/path/to/replacements.txt -fi '*.json'
(replace~/path/to/
with the actual path). - On screen, check the files that are going to be changed. Make sure that only the files that you want will be modified.
- Do a
git reflow ...
by following the command provided in your Terminal, the same as in Step 2 above. - Execute
git push -f
(same asgit push --force
).
The API key value should now be removed from all commit history both locally and in the remote repository.
–
- In a real-world scenario, you should let the team know that the scrubbing process is done and they should delete their local branches and re-clone/pull from the cleaned remote repo. Failure to do this could cause the removed history to be re-introduced.
- If the leaked content was only on a single branch and you're ok to lose any local changes, or don't have any, then you can cleanly remove a local branch using the command
git branch -D <branch name>
. - If on the other hand you have un-merged changes you don't want to lose, then you can rebase instead. Warning: Do not use a merge for this operation as it may reintroduce the removed history. Generally, from your base branch, say
trunk
,git pull
, switch to your working branchgit checkout fix/something
, and rebase withgit rebase trunk
.
–
- While the content/file is removed, linking directly to the original commit (eg
github.com/username/repo-name/commit/652ac....194c
) will still show the content or file even though the commit is no longer included in our repo history, and anyone forking or cloning the repo won’t have this history or any reference to it. It is for this reason that any secrets that are committed to Github should be considered compromised, even if removed. - To completely remove the content from Github after cleaning with BFG you need to contact Github. Reference.
- Merged Branches - In a real-world scenario if you're removing files or text from a branch that's already been merged into another branch, you'll want to make sure you push your initial changes to those branches as well, before running the BFG commands. For example, you're removing a leaked file in the branch
fix/something
, and this was already merged totrunk
before discovering the leak, you'll need to also merge thefix/something
intotrunk
after yourgit rm...
and commit/push. Alternatively, you can remove/make changes to each branch separately, then run BFG. When pushing the final history changes, you can usegit push --all --force
. This will force push the change to all branches. - In case you are not running brew, or using another operating system, you will need to manually download and run the .jar BFG file. This will require Java to be installed if not already so, and is out of scope for this workshop. However, you might find some of the responses in this Github issue thread helpful.
- After running BFG you'll notice in your local repo a folder with the name format
<repo-name>.bfg-report
. It contains a somewhat criptic log of actions performed by BFG on the local repository. It's safe to remove this folder when you're finished using the BFG.
-
BFG Repo Cleaner:
https://rtyley.github.io/bfg-repo-cleaner/ -
Scrubbing a repo clean:
https://fieldguide.automattic.com/scrubbing-a-repo-clean/ -
Sanitizing Repository History Using Tower and the BFG:
https://fieldguide.automattic.com/sanitizing-repository-history-using-tower-and-the-bfg/ -
Removing sensitive data from a repository: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository
-
How to substitute text:
https://stackoverflow.com/a/15730571/1940238
bfg 1.14.0
Usage: bfg [options] [<repo>]
-b, --strip-blobs-bigger-than <size>
-B, --strip-biggest-blobs NUM
-bi, --strip-blobs-with-ids <blob-ids-file>
-D, --delete-files <glob>
--delete-folders <glob>
--convert-to-git-lfs <value>
-rt, --replace-text <expressions-file>
-fi, --filter-content-including <glob>
-fe, --filter-content-excluding <glob>
-fs, --filter-content-size-threshold <size>
-p, --protect-blobs-from <refs>
--no-blob-protection
--private
--massive-non-file-objects-sized-up-to <size>
https://repository.sonatype.org/service/local/repositories/central-proxy/content/com/madgag/bfg/1.14.0/bfg-1.14.0.txt