[Improve Existing Best Practice Guide]: Automated checking for general sensitive information within Git
riverma opened this issue ยท 22 comments
Checked for duplicates
Yes - I've already checked
Best Practice Guide
Continuous Integration
Best Practice Guide Sections
Starter Kits
Describe the improvement
We have some existing recommendations for checking sensitive AWS credential information via using git-secrets
described here. However, we've received feedback that this could be improved via the following:
- Sample pattern files to check for more specific sensitive information such as IPs, username / passwords, ARNs, security-groups
- A GitHub-side automation that checks repositories even if folks have committed and pushed sensitive information
To support these two needs, we should evaluate if git-secrets
is the right tool, or if it should be augmented or replaced with a better solution.
One other idea:
- absolute file paths - not sure how feasible/possible this is, but absolute file paths on file systems are considered sensitive from SAs
Per this:
A GitHub-side automation that checks repositories even if folks have committed and pushed sensitive information
The guidelines should also include a link to documentation about how to deep clean this from your commit history. Additionally, this automated check should also include GitHub Issues, which can include sensitive information.
Great suggestions @jordanpadams - we will plan to include these in our scope.
We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.
We should search for anything that includes information regarding our infrastructure including sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.
That's a very comprehensive list of tips - thank you very much @sneely333. We will look these over.
I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.
- They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.
The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.
The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.
Tool Name | File Content | File Name | Commit Message | Pre-commit-hook | Check history | GitHub-side Automation | Customized Pattern | Entropy Analysis |
---|---|---|---|---|---|---|---|---|
git-secrets | Yes | No | Yes | Yes | Yes | Yes (doable) | Yes | No |
gitleaks | Yes | No | No | Yes | Yes | Yes | Yes | Yes |
trufflehog | Yes | No | No | Yes | Yes | Yes | Yes | Yes |
talisman | Yes | Yes | No | Yes | Yes | Yes (doable) | Yes | Yes |
@perryzjc - great work here. Will scope this and provide feedback. One tip is you'll want to also include GitHub's own secrets scanning tool in your trade-study. What is missing from GitHub's tool that these other tools support?
I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.
- They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.
The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.
The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.
Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.
Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:
- New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
- Full codebase scans (locally)
- Full codebase history, including previous commits
Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.
I did some Trade Studies on four tools and put the references in the table. I feel they are quite similar.
- They all support customized regular expression, which provides a possibility to solve the needs in this ticket and other potential needs.
The main difference is about the support of Entropy Analysis. I have done some trials on this feature. It's sometimes useful for complex passwords and TOKEN format. If the current set of regular expressions didn't work, this feature could sometimes remind us, but not always. Therefore, inspecting sensitive information still needs to be taken care of.
The other slight differences are Commit Messages and File Name. We may need to use a combination of those tools based on the needs.Solid analysis here @perryzjc - thanks! The entropy analysis feature is interesting. That could help in identifying sensitive information, though it might flag memory addresses in code as well.
Based on the tools listed, which do you recommend proceeding with and why? One or more tools? It'd be great to get an architecture / flow diagram of where the tool(s) solution proposed fit in with the following scenarios:
- New code commits (locally) -> code pushes (to remote) -> code CI (on GitHub.com)
- Full codebase scans (locally)
- Full codebase history, including previous commits
Additionally - how can we make use of these tool solutions plug-and-play? The GitHub Action route is has obvious appeal, but how about client side? You might want to look at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks.
Hi @riverma, thank you for the guidance! I will do one more trade study on GiHubโs own secret scanner and then make a architecture graph answering those questions.
As an comprehensive Architecture Diagram is taking longer than I expected (will post soon within next 2 days), I would like to provide an update on my trade study of GitHub Action's Secret scanning firstly.
GitHub Action's Secret scanning looks like a convenient and user-friendly product, particularly for GitHub-side automation. It offers additional features compared to other tools, but it also has its limitations, which is not friendly for public repositories.
Updated Trade Study table, compared to the old one
Tool Name | File Content | File Name | Commit Message | Pre-commit-hook | Check history | GitHub-side automation | Customized Pattern | Entropy Analysis |
---|---|---|---|---|---|---|---|---|
git-secrets | Yes | No | Yes | Yes | Yes | Yes (doable) | Yes | No |
gitleaks | Yes | No | No | Yes | Yes | Yes | Yes | Yes |
trufflehog | Yes | No | No | Yes | Yes | Yes | Yes | Yes |
talisman | Yes | Yes | No | Yes | Yes | Yes (doable) | Yes | Yes |
Secret scanning (GitHub's) | Yes | No | No | Similar as push protection | Yes | Yes, and more convenient as a built-in product | Yes | No |
Here are the unique features of GitHub's Secret scanning
Pros
- GitHub's built-in products offer easier configuration for GitHub-side automation.
- In addition to the factors mentioned above, GitHub's Secret scanning also offers an additional feature : Scan Issue description and comment
Cons
- Although GitHub's Secret scanning is free for public repositories, a license for GitHub Advanced Security is required for private repositories as mentioned in their documentation.
- In my experience during the trial, the public version of GitHub's Secret scanning lacked support for custom patterns.
- Secrets found in public repositories using the free secret scanning alerts for partners service are reported directly to the partner, without creating an alert
- Although GitHub's Secret scanning supports custom patterns, there is a limit on the number of patterns that can be added: Up to 500 custom patterns for each organization or enterprise account, and up to 100 custom patterns per repository.
My thought
Because of the Cons, I feel GitHub's secret scanning is not useful for public repository, it's more like a product helping companies discover whether the tokens they provide to users have been abused
Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!
The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.
Look forward to your architecture / recommendation for this ticket!
Hey @perryzjc - thanks for the deep dive analysis of GH Secrets Scanning. Appreciate the opinions and evidence you've brought forth. Great work here!
The one unique factor GH Secrets Scanning is checking issue tickets for secrets, though I think it'd be far more useful if custom patterns were supported. Often sensitive file paths, internal URLs appear in issue tickets that we don't want there. I'm curious if a GitHub action can be written (using one of the tools you've suggested) to scan not only code, but issue tickets as well without much additional work.
Look forward to your architecture / recommendation for this ticket!
Hi @riverma - yes, I think it's doable to scan the issue tickets. Here is the relevant screenshot:
I have been doing a lot of research lately, consulting with other software engineers, and testing out multiple tools (in addition to the ones I mentioned previously). From what I've found so far, it seems that there isn't a single tool or combination of tools in the open source world that can fully meet all of our needs. For instance, it's challenging to find a tool that can scan file content, file names, commit information, history, issue tickets, support pre-commit-hooks, support regular expressions, and have entropy analysis capabilities all at once. This means that some customization will likely be necessary.
However, I recently stumbled upon a tool called "detect-secrets" that was recommended by Microsoft. It supports entropy analysis, has some commonly used regular expressions built-in, and scans quickly. It's also written in Python and designed to be scalable, which is a big plus.
While it doesn't currently support detecting file names and commit information, I've found that it's entirely feasible and relatively straightforward to modify the Python code and create a commit-msg hook.
The tool works well on GitHub Actions, but I did encounter one issue: the free version of GitHub doesn't support pre-receive hook. This means that although GitHub Action can detect the presence of sensitive information, the file has already been uploaded to the branch. This same issue also applies to issue tickets.
As of now, I don't have a solution for these two problems, but I think that the current approach is the most optimal solution compared to other options out there. It covers a wide range of needs and can potentially solve those two problems.
In addition, when it comes to scanning history, trufflehog is particularly powerful. If we could incorporate our customized detect-secrets tool into that type of historical scan, I believe the results would be excellent.
I'll be providing an architecture diagram shortly.
Here is the Scope of Work my solution able to provide:
Note
The priority of each implementation is based on the needs from the community
Scope of Work:
- Research and implement a workflow that can effectively manage secrets in git and GitHub repositories.
- Able to identify various types of secrets, such as IPs, username / passwords, ARNs, security-groups, absolute file paths, sg's, vpc's, subnets, aws account numbers, ami's, bucket names, ip addresses, hostnames, roles, arn's, usernames, internal url's, and passwords.
- Able to detect potential secrets that may not have been aware yet.
- Utilize different methods to detect secrets, such as file content, filename, commit message, GitHub issue description and comments.
- Scan the complete codebase history, including previous commits, to identify secrets.
- Automatically detect secrets in both local commits and remote pushes.
- Would be nice to Implement commit protect and push protect functionality to prevent accidental secrets exposure.
- Provide guidelines on how to clean secrets from commit history, including relevant documentation.
Hi @perryzjc - thanks for the write up here!
My thoughts:
- Preference on the client-side scanning over GitHub if you're having trouble handling the latter. GitHub should serve as a backup layer to prevent sensitive info (more alerting than stopping), but understandably it won't have all the safety features of a git pre-hook. I think if someone writes code on GitHub itself and pushes to a branch, we could have the automation point to docs about purging the repo history. We don't want to require GitHub Enterprise features btw.
- In terms of features to support, I think prioritize the features specifically mentioned in this ticket over others that might be nice but may get us bogged down
- Some use cases to make things more tangible for your architecture diagram / approach: (1) client-side full scan of existing code base, (2) client-side scan of updated code upon Git commit, (3) server-side push to GitHub.com from client, or writing code on GitHub.com itself and being warned about sensitive info at earliest possible stage and pointers on how to purge / fix
Here is my diagram about how each tool relates to each need.
graph TD
subgraph solution
subgraph Development-Tools
subgraph open-source-tools
detect_secrets[Detect Secrets]
trufflehog[Trufflehog]
style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style trufflehog fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph automation-tools
subgraph local-git-hooks
pre-commit[pre-commit hook]
commit-msg[commit-msg hook]
style pre-commit fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style commit-msg fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph remote-GitHub-Action
pre-receive[pre-receive hook]
workflows[workflows on push]
webhook[webhook]
style pre-receive fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style workflows fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style webhook fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
style Development-Tools fill:#5DADE2,stroke:#333,stroke-width:2px
end
subgraph Needs
Detecting-diverse-types-of-secrets{{Detecting-diverse-types-of-secrets}}
Local-commit-protection[[fa:fa-ban Local-commit-protection]]
Alarm-detected-secrets>fa:fa-camera-retro Alarm-detected-secrets]
Push-protection[[fa:fa-ban Push-protection]]
Detect-secrets-in-full-code-history[(Detect-secrets-in-full-code-history)]
Detection-of-secrets-in-different-media{{Detection-of-secrets-in-different-media}}
end
detect_secrets == solves ==> Detecting-diverse-types-of-secrets
local-git-hooks == solves ==> Local-commit-protection
remote-GitHub-Action == solves ==> Alarm-detected-secrets
pre-receive == solves if GitHub enterprise version ==> Push-protection
remote-GitHub-Action == solves ==> Detection-of-secrets-in-different-media
local-git-hooks == solves ==> Detection-of-secrets-in-different-media
trufflehog == solves ==> Detect-secrets-in-full-code-history
subgraph Other-notes
subgraph development
subgraph Server-may-needed
end
subgraph Web-crawling-may-needed
end
end
subgraph needs
subgraph webhook's-post-request
end
subgraph Secrets-in-previous-issues
end
end
PRI([Becasue of the scalability of detect secrets, <br> all feature proposed here are implementable. <br> But priority is based on the needs of the community.])
style PRI fill:#AF505C,stroke:#333,stroke-width:2px
end
webhook's-post-request -. need to be handled by .-> Server-may-needed
Secrets-in-previous-issues -. can be scaned by .-> Web-crawling-may-needed
webhook -- sends --> webhook's-post-request
style solution fill:#EC7063,stroke:#666,stroke-width:4px
style Detecting-diverse-types-of-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
style Local-commit-protection fill:#48C9B0,stroke:#333,stroke-width:2px
style Push-protection fill:#48C9B0,stroke:#333,stroke-width:2px
style Alarm-detected-secrets fill:#48C9B0,stroke:#333,stroke-width:2px
style Detection-of-secrets-in-different-media fill:#48C9B0,stroke:#333,stroke-width:2px
style Detect-secrets-in-full-code-history fill:#48C9B0,stroke:#333,stroke-width:2px
style Server-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
style Web-crawling-may-needed fill:#58D68D,stroke:#333,stroke-width:2px
style webhook's-post-request fill:#58D68D,stroke:#333,stroke-width:2px
style Secrets-in-previous-issues fill:#58D68D,stroke:#333,stroke-width:2px
end
Development Tools:
- Open source tools
- detect secrets
- trufflehog
- python
- Automation tools
- local - git hooks
- pre-commit hook
- commit-msg hook)
- remote - GitHub Action
- workflows on push
- webhook
- pre-receive hook if using GitHub enterprise
- local - git hooks
Notes
Becasue of the scalability of detect secrets, all feature proposed here are implementable.
But priority is based on the needs of the community.
Usage
detect secrets
- written in python
- easy to configure
- has
- Built-in plug-ins to detect popular patterns
- Include Entrophy Analysis plug-in -- useful to detect -> the pattern not in the plug-ins yet
- Scalable way to create customized plug-in -- helpful to detect -> the special needs from community, such as absolute file paths
- plug-in is not limited to regular expression, any python code logic could work!
- -- solves -> Detecting diverse types of secret
- -- can work with -> local - git hook
- -- can work with -> remote - GitHub Action
local - git hook
- written in shell
- has
- pre-commit hook -- acheive -> commit protection
- commit-msg hook -- acheive -> detect commit message
- -- solves -> Local commit protection
remote - GitHub Action
- written in yaml and shell
- has
- pre-receive hook (only for GitHub enterprise, not available for free version)
- workflows on push -- arise alarm for -> detected secrets in new push
- webhook -- support the secrets detect on -> a wider range of GitHub activities (including Issue discussion)
- -- solves if GitHub enterprise version -> push protection
- -- solves -> alarm ๐จ for detected secrets
local - git hook (AND) remote - GitHub Action
- -- solves -> Detection of new secrets in different media
trufflehog
- written in Go
- has
- convinient and strong functionality on scanning history
- -- able to scans -> the history of a repository
- -- able to scans -> the history of a organization
- convinient and strong functionality on scanning history
- -- solves -> Detecting secrets in full code history
- But it's not as scalable as
detect secrets
, so it's not easy to scan the history of secrets that appears on filename, commit message, and the previsou issue
- But it's not as scalable as
Other notes
- GitHub Action's webhook work in this way:
- Once there is an event triggered, GitHub send the information to an URL
- So if we want to detect the secrets, we probably need to hold a server to handle the post request. Call
detect secrets
to detect the information inside the post request
- So if we want to detect the secrets, we probably need to hold a server to handle the post request. Call
- Once there is an event triggered, GitHub send the information to an URL
- GitHub Action's webhook get triggered only for new event
- So if we want to scan the previous issues, we may need to implement Web crawling to obtain the information of all issue tickets, then call our existing function to handle the detected secrets.
Hi @perryzjc - thanks for the write up here!
My thoughts:
- Preference on the client-side scanning over GitHub if you're having trouble handling the latter. GitHub should serve as a backup layer to prevent sensitive info (more alerting than stopping), but understandably it won't have all the safety features of a git pre-hook. I think if someone writes code on GitHub itself and pushes to a branch, we could have the automation point to docs about purging the repo history. We don't want to require GitHub Enterprise features btw.
- In terms of features to support, I think prioritize the features specifically mentioned in this ticket over others that might be nice but may get us bogged down
- Some use cases to make things more tangible for your architecture diagram / approach: (1) client-side full scan of existing code base, (2) client-side scan of updated code upon Git commit, (3) server-side push to GitHub.com from client, or writing code on GitHub.com itself and being warned about sensitive info at earliest possible stage and pointers on how to purge / fix
Hi @riverma, thanks for the suggestions!
I'll prioritize the needs of the community for the actual implementation. The first diagram was just to show the potential features of my solution.
I've added another diagram to show how each tool relates to each need.
Next up, I'll work on the diagram you mentioned.
With these three diagrams, hope it can provide people a better understanding of what my solution can do and how to use it.
@riverma Here are my other diagrams of the solution. It includes the most essential parts of my solution.
Solution Structure Diagram
graph TD
subgraph SolutionStructure
subgraph SecretsDetectionApproach
subgraph Layer1["Layer 1: Push to GitHub.com (server-side)"]
style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph Layer2["Layer 2: Scan of updated code upon Git commit (client-side)"]
style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph Layer3["Layer 3: Full scan of the existing code base (client-side)"]
style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
subgraph Tools
subgraph CoreTool["Core Tool: Detect Secrets"]
style CoreTool fill:#5DADE2,stroke:#333,stroke-width:2px
detect_secrets{{detect-secrets}}
style detect_secrets fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
subgraph OtherTools["Other Tools"]
pre_commit_ci[pre-commit.ci]
github_action[GitHub Action]
pre_commit_manager[pre-commit manager]
style github_action fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style pre_commit_ci fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style pre_commit_manager fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
end
end
subgraph LayerDetails
subgraph Layer1Details[Layer 1 Details]
Compatible_with_all_local_machines{{Compatible with all local machines}}
Protection_for_main_branch{{Protection for the main branch}}
Error_notifications_via_GitHub_email{{Error notifications via GitHub and email}}
Implemented_with_detect_secrets_workflow[[Implemented with detect-secrets workflow]]
end
subgraph Layer2Details[Layer 2 Details]
Optional_due_to_compatibility_issues{{Optional due to compatibility issues}}
Early_stage_secrets_detection{{Early-stage secrets detection}}
Commit_prevention_and_error_messages{{Commit prevention and error messages}}
Implemented_with_pre_commit_manager[[Implemented with pre-commit manager]]
end
subgraph Layer3Details[Layer 3 Details]
Direct_use_of_detect_secrets{{Direct use of detect-secrets}}
Error_messages_for_detected_secrets{{Error messages for detected secrets}}
end
end
Layer1 -->|uses| pre_commit_ci
Layer1 -->|uses| github_action
Layer2 -->|uses| pre_commit_manager
Layer1 -->|uses| detect_secrets
Layer2 -->|uses| detect_secrets
Layer3 -->|uses| detect_secrets
style SolutionStructure fill:#EC7063,stroke:#666,stroke-width:4px
style SecretsDetectionApproach fill:#AF7AC5,stroke:#333,stroke-width:2px
style Tools fill:#48C9B0,stroke:#333,stroke-width:2px
style LayerDetails fill:#F1948A,stroke:#333,stroke-width:2px
style Compatible_with_all_local_machines fill:#58D68D,stroke:#333,stroke-width:2px
style Protection_for_main_branch fill:#58D68D,stroke:#333,stroke-width:2px
style Error_notifications_via_GitHub_email fill:#58D68D,stroke:#333,stroke-width:2px
style Implemented_with_detect_secrets_workflow fill:#58D68D,stroke:#333,stroke-width:2px
style Optional_due_to_compatibility_issues fill:#58D68D,stroke:#333,stroke-width:2px
style Early_stage_secrets_detection fill:#58D68D,stroke:#333,stroke-width:2px
style Commit_prevention_and_error_messages fill:#58D68D,stroke:#333,stroke-width:2px
style Implemented_with_pre_commit_manager fill:#58D68D,stroke:#333,stroke-width:2px
style Direct_use_of_detect_secrets fill:#58D68D,stroke:#333,stroke-width:2px
style Error_messages_for_detected_secrets fill:#58D68D,stroke:#333,stroke-width:2px
end
User Workflow Diagram
flowchart TB
User([fa:fa-user User])
subgraph UserWorkflow["User Workflow to Secure Secrets"]
Layer1["1. Layer 1: GitHub.com (server-side)"]
Layer2["2. Layer 2: Git commit scan (client-side)"]
Layer3["3. Layer 3: Full scan (client-side)"]
Layer1 -->|If Secrets Detected| Clean1[Purge or Fix the commit manually]
Layer2 -->|If Secrets Detected| Clean2[Clean local file directly. <br> Don't need to worry about cleaning commit history]
Layer3 -->|If Secrets Detected| Clean3[Clean local file directly.]
Secure["Only Main branch is in safe. <br> Secrets are leaked on other branch before cleaning"]
Clean1 --> Secure
SaveTime["It saves your time. And secrets are safe from GitHub"]
Clean2 --> SaveTime
Clean3 --> SaveTime
end
User -->|At least use| Layer1
User -->|Helpful to use| Layer2
User -->|Optional to use| Layer3
style User fill:#F6F5F3,stroke:#333,stroke-width:1px
style UserWorkflow fill:#AF7AC5,stroke:#333,stroke-width:2px
style Layer1 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Layer2 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Layer3 fill:#F3B044,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5
style Clean1 fill:#5A88ED,stroke:#333,stroke-width:2px
style Clean2 fill:#5A88ED,stroke:#333,stroke-width:2px
style Clean3 fill:#5A88ED,stroke:#333,stroke-width:2px
style SaveTime fill:#5ABF9B,stroke:#333,stroke-width:2px
style Secure fill:#AF3034,stroke:#333,stroke-width:2px
Documentation
Solution Structure Diagram
-
Secrets Detection Approach
- Layer 1: Push to GitHub.com from the client (server-side)
- Layer 2: Scan of updated code upon Git commit (client-side)
- Layer 3: Full scan of the existing code base (client-side)
-
Tools:
- Core tool
- detect-secrets
- Customizable with creating additional plug-ins (Python)
- detect-secrets
- Other tools
- pre-commit.ci
- pre-commit manager
- GitHub Action
- Core tool
-
Layer Details
- Layer 1: Server-side push to GitHub.com
- Compatible with all local machines
- Protection for the main branch
- If secrets detected,
- Error notifications (including guideline of fix/ purge) via GitHub and email
- Implemented with
.github/workflows/detect-secrets.yml
, pre-commit.ci, and detect-secrets
- Layer 2: Client-side scan upon Git commit
- May have compatibility issues
- Early-stage secrets detection
- If secrets detected,
- Commit prevention and error messages
- Implemented with pre-commit manager and
.pre-commit-config.yaml
using detect-secrets
- Layer 3: Full scan of existing code base
- Direct use of detect-secrets
- If secrets detected,
- error messages
- Layer 1: Server-side push to GitHub.com
User Workflow Diagram
-
User Interaction with Layers:
- Layer 1: GitHub.com (server-side)
- The user should at least use this layer for securing secrets.
- Layer 2: Git commit scan (client-side)
- Using this layer is helpful for the user to detect secrets early on.
- Layer 3: Full scan (client-side)
- This layer is optional for the user to use for additional security.
- Layer 1: GitHub.com (server-side)
-
Actions to be taken if secrets are detected:
- Layer 1: Purge or fix the commit manually
- If secrets are detected, the user must purge or fix the commit manually to ensure the main branch remains secure.
- Layer 2: Clean local file directly
- If secrets are detected, the user can clean the local file directly, without worrying about cleaning the commit history.
- Layer 3: Clean local file directly
- If secrets are detected, the user can clean the local file directly.
- Layer 1: Purge or fix the commit manually
-
Effects of using different layers:
- Using Layer 1 ensures that only the main branch is safe, and secrets are leaked on other branches before cleaning.
- Using Layer 2 and Layer 3 saves the user's time and keeps secrets safe from GitHub.
Hi @perryzjc -
Excellent work here with the research and brining this all together. I support your plan here, but I have a couple questions and suggestions.
Questions:
- How would developers be notified of sensitive information being accidentally pushed via the "Layer 1" workflow?
- With
detect-secrets
- where are patterns / RegExes stored such that users can customize further (from our baseline) which sensitive patterns to search for? - How active of a project would you say
detect-secrets
is and how does that affect the risk of using the software? I see the last release was in Oct 2022, and last commit Dec 2022. On the other hand, TruffleHog seems far more active. One way to assess is to reach out to the project's community and see how soon they respond to your questions.
Suggestions:
- Keep it simple for the user: the "Layer 1" workflow should be a stand-alone GitHub Action that can be deployed with a single click. The "Layer 2/3" workflows should be as simple as a two-step process: (1) installing the software / dependencies using a package manager set of commands, (2) loading a custom configuration you've created and running away
- In terms of prioritization: I'd suggest you start with Layer 1, then Layer 2, then Layer 3. This order of priority would make infusion to projects the simplest.
- I think the key for the "Layer 1" workflow is we don't want public alerts for sensitive information found, we want to alert the developers so they can quickly and surreptitiously make the fix.
- For the "Layer 1" workflow - it'd be good to link to clean-up instructions for past commits.
Hi Rishi, here is my response to the Questions:
-
Developers will be notified by email sent from GitHub Action.
-
detect-secrets
has aplug-in folder
We can add features (other patterns) by putting or modifying the Python scripts in that folder. It's scalable and convenient. -
The newest release of
detect-secrets
was on October 5, 2022. It's much more active than git-secrets, whose latest update was three years ago. -
Also,
detect-secrets
is an enterprise-friendly way of detecting and preventing secrets in code. Currently, I've found that IBM and Yelp are using it. I've also found that it is recommended by Microsoft
Also, thank you for the Suggestions! When it comes to the implementation, I will try to complete those features and make them as convenient and secure as possible.
Here are three sequence diagrams to help people better understand the three layers.
Layer1 - Server-side push to GitHub.com
sequenceDiagram
participant User as Developer
participant GH as GitHub
participant Config as .pre-commit-config.yaml
participant CI as Pre-commit CI
participant DS as Detect-Secrets
Note over User,GH: Developer creates pull request or pushes to branch
User->>+GH: Creates pull request / pushes to branch
GH->>+Config: Fetches pre-commit config
Config->>CI: Returns config with Detect-Secrets setup
CI->>DS: Requests secret scan
DS->>DS: Scans pull request / branch for secrets with custom plugins
alt Secrets Detected
DS-->>CI: Returns detected secrets
CI-->>GH: Reports status check as failed
GH-->>User: Prevents merge / push & reports status check
else No Secrets Detected
DS-->>CI: Returns clean result
CI-->>GH: Reports status check as passed
GH-->>User: Allows merge / push
end
Layer2 - Git commit scan (client-side)
sequenceDiagram
participant User as Developer
participant Local as Local Environment
participant Config as .pre-commit-config.yaml
participant PCH as Pre-commit Hook
participant DS as Detect-Secrets
participant File as Baseline File
Note over User,Local: Developer attempts to commit
User->>+Local: Request commit
Local->>+Config: Fetches pre-commit config
Config->>PCH: Returns config with Detect-Secrets setup
PCH->>DS: Request secret scan with existing baseline
DS->>File: Fetches baseline file
File->>DS: Returns baseline file
DS->>DS: Scans changes for secrets with custom plugins
alt New Secrets Detected
DS-->>PCH: Returns detected secrets
PCH-->>Local: Prevents commit & reports detected secrets
Local-->>User: Prevents commit & reports detected secrets
else No New Secrets Detected
DS-->>PCH: Returns clean result
PCH-->>Local: Allows commit
Local-->>User: Commits changes
end
Layer3 - Full scan and audit (client-side)
sequenceDiagram
participant Dev as Developer
participant Env as Local Environment
participant DS as Detect-Secrets
participant File as Baseline File
participant Audit as Audit Tool
Note over Dev,Env: Developer initiates a direct scan for secrets
Dev->>+Env: Triggers direct scan
Env->>+DS: Requests scan on the codebase
DS->>DS: Performs secret scanning
DS->>File: Generates new baseline file
File->>DS: Acknowledges file creation
DS-->>-Env: Returns scan results and new baseline file
Env-->>Dev: Presents scan results and new baseline file
Note over Dev,File: Developer may audit the new baseline file
Dev->>Audit: Initiates audit on the new baseline file
Audit->>File: Fetches details from the baseline file
File->>Audit: Returns secret details
Audit-->>Dev: Presents detailed information of detected secrets
@perryzjc has proposed his plugins as PR's to Yelp's Detect Secrets core codebase here. Once those are accepted, we may no longer need to host a separate fork of detect secrets in the future.
Resolved.