Dupin
But it is in matters beyond the limits of mere rule that the skill of the analyst is evinced. He makes in silence a host of observations and inferences....
— Edgar Allan Poe, The Murders in the Rue Morgue
Dupin is a tool to help discover secrets in Git repositories.
It is designed to be used as a tool for regularly scanning an organisation's public Git repositories, notifying a nominated email address when it finds anything that looks suspicious.
Quickstart
Install Dupin from source with pip install <path-to-dupin>
.
(virtualenv
is recommended)
For these examples we'll use ~/.dupin
as our root directory,
you can use anything that makes sense for you.
ROOT=~/.dupin
# sets up a directory for Dupin to store its repositories and results
dupin setup --root $ROOT
# stores a list of your organisation's public repos
dupin update-repos --root $ROOT organisation-name
# if you get rate limit errors you'll need to provide a Github
# token with the --token argument
# scan all repositories in the list for secrets, logs and shows results
dupin auto-scan-all --root $ROOT
# this logs what it finds in the $ROOT/results directory and the
# details to the console
# it's also possible to email reports, more details below and in the
# config section
Installation
Dupin is an installable package Python package, but is not hosted in
public Python repositories. You can clone the source code and then
use pip
to install Dupin. This will also install its dependencies.
As ever, it's better to install Dupin into a virtual environment. This prevents Dupin's dependencies from creating problems with other Python software on your machine.
git clone git@github.com:guardian/dupin.git
# via a virtualenv, or globally (may require sudo)
pip install dupin
You should then be able to run dupin
.
AWS
This repository includes a CloudFormation template which creates an EC2 instance that runs Dupin on a schedule. If you have an AWS account this is the easiest way to run Dupin.
Usage
Dupin offers several commands. Check the program's main file for full info, the main commands are described below.
Note: many of these commands interact with Dupin's directory structure. More information about the layout Dupin uses to store data is available below, in the Directory structure section.
Global arguments
These arguments apply to many/all of Dupin's commands.
--root
Sets the root directory for Dupin's directory structure.
--config
By default, this is read from ROOT/config
if a root is provided.
You may instead provide a custom location. This should point to a
yaml
file that contains Dupin's config.
setup
The setup command initialises Dupin's directory structure. If you're using any of the features Dupin offers that depend on the data it has stored (likely) you'll need to run this command first.
Examples:
duping setup --root ~/.dupin
update-repos
This command looks up an organisation's public repositories on Github and writes them to a file.
Examples:
# provide args via a config file at ~/.dupin/config
dupin update-repos --root ~/.dupin
# provide args explicitly
dupin update-repos myorg --token abcdef
# save the list of repositories in a provided location
dupin update-repos --file /tmp/organisation-repos.txt
# exclude some very large repositories from the scan
dupin update-repos myorg --repo-exclusions organisation/large-repo.git
--file
By default it writes to ROOT/repository-urls
(you'll need to provide
a --root
argument to take advantage of this). You can specify an
alternative file.
--include-forks
By default, Dupin will not include repositories that are forks. These tend to contain only minor changes and the source repository is often very large. Dupin's aim is to try and find secrets in an organisation's repositories, if you'd like to include forks you should pass this flag.
--repo-exclusions
This setting specifies Git repositories that should be excluded from the resulting list.
Very large Git repositories cannot easily be scanned by TruffleHog, and therefor by Dupin. The resulting log file will likely be too large because of false positives and the scan itself will likely consume too much memory. You should use this option (or the corresponding config property) to exclude large repositories and use a different approach to check for secrets in those repositories.
auto-scan-all
This command scans all the repositories it finds in ROOT/repository-urls
for secrets, and saves its findings. It will also generate a diff of
these findings compared to the previous version and display this diff
for the user. This makes it easy to spot when secrets have been
introduced (or removed).
If you provide the --notify
flag, Dupin will read the provided
configuration and email the changes in its findings.
NOTE: Emailing secrets is a silly idea so Dupin supports PGP encryption of its notification emails. To enable this feature simply provide a PGP Public Key in the configuration (read the config section for more info)
Examples:
# scans and prints changes to the console
dupin --root ~/.dupin auto-scan-all
# instruct Dupin to send notification emails (requires config)
dupin --root ~/.dupin auto-scan-all --notify
--notify
This flag tells Dupin to send notification emails. Doing so will require additional configuration. Since this configuration is non-trivial, you should provide it in a config file, rather than as arguments to Dupin.
More information on configuring Dupin for sending email is available below, under Configuration, specifically SMTP
Directory structure
Dupin creates a directory structure for storing its results as follows.
root
├── config
├── repository-urls
├── repositories
│ ├── example.git
│ │ ├── ...etc contents of example repo
│ │ └── .git
│ └── example-2.git
│ ├── ...etc contents of example-2 repo
│ └── .git
└── results
├── .git
├── example-2
└── example
config
You may provide a config file that saves passing lots of arguments to
all of Dupin's commands. By default, Dupin looks in ROOT/config
for
this file.
repository-urls
This file contains a list of repository URLs, one per line. This is what Dupin uses to determine what to scan.
You can edit the list yourself, or generate it using Dupin's
update-repos
command.
repositories
This is where Dupin stores a local copy of the repositories it scans. If Dupin finds a new repository while scanning it will clone a copy to this location. If the repo already exists it will update it before scanning.
results
The results directory is a Git repository that contains the history of Dupin's scans. This is also used to determine changes since when notifying Dupin emails details of changes.
Configuration
You can provide a config file to set some parameters for Dupin without needing to pass them every time. This also lets you keep secrets away from the git repository.
If you provide a --root
argument to Dupin it will attempt to read the
config from a file in that root called config
. Alternatively, you can
specify the config file location with the --config
argument.
root
├── config <- default location for config
├── repository-urls
├── repositories
│ └── ...etc
└── results
└── ...etc
Here's an example configuration file. The file should be written using YAML. Look at config.py for more info about how this works.
github_token: xxxxxxxx-github-token-xxxxxxxx
organisation_name: your-organisation
notification_email: recipient@example.com
include_forks: true
repo_exclusions:
- organisation/large-repo.git
- org2/another-excluded-repo.git
smtp:
host: smtp-server.example.com
from: sender@example.com
username: username
password: password
pgp_key: |
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1
abdefghihjklmnopqrstuvwxyz...etc
...etc
...etc
Most of these setting can be provided as arguments to Dupin instead of
as configuration, but it's generally simpler and safer to put them in
a config file. In particular, the auto-scan-all
reads its arguments
from the configuration for simplicity and the SMTP settings can only be
provided from config.
Github token
This is used when Dupin fetches the list of organisation repositories. Dupin searches public repositories so in theory this token isn't required. In practice, if your organisation has a large number of repositories you'll hit Github's rate limit while Dupin runs through the pagination. If this happens you'll need to provide authentication so you are given a higher rate limit.
Organisation name
This tells Dupin which organisation to use when it creates its list of repositories that should be scanned.
Notification email
Dupin uses this as a "to" address when it emails updates to your organisation's secrets.
Include forks
As described abve, Dupin will not include repositories that are forks.
If you'd like to include forks in the generated list of repositories,
you can specify this from Dupin's config by setting include_forks
to
true
.
Repository exclusions
This property allows you to exclude specified Git repositories from the list of repos that will be scanned. This is particularly useful if you have some very large repositories that Dupin is unable to scan.
The exclusions should be provided as a YAML list. Any repository that matches any of the provided strings will be excluded, so be specific (you should probably include .git at the end where possible).
SMTP
If no SMTP host is provided, Dupin will attempt to send an email using
localhost
. If your machine does't have a mail server running locally
this will fail. Even if it does, you're probably better off using a real
mailserver. The following settings allow you to configure the way Dupin
sends emails.
Host
The hostname of the SMTP server to use.
From
Tells Dupin what to use as the "from" address for notification emails.
Username & password
These settings are used to authenticate the SMTP connection. You'll get these when you configure your mailserver.
PGP key
Sending sensitive secrets over email is a silly idea. To deal with this, Dupin supports PGP encryption of the email contents.
You should provide a PGP public key as text. YAML supports multi-line
strings using the |
character like in the example above. Make sure
the key is consistently indented.
If you are using GnuPG you can obtain the public key in the correct format using the following command. Other PGP applications will offer similar functionality for exporting public keys.
gpg --armor --export <identity/email>
If you provide a PGP key in the configuration, PGP encryption will be automatically enabled.
Development
To run Dupin's tests, set up your virtualenv and install Dupin's
requirements using the provided requirements.txt
file. You can
then use unittest in discover
mode to run all the tests.
# your choice of virtualenv location
VENV=.venv
virtualenv $VENV
$VENV/bin/pip install -r requirements.txt
$VENV/bin/python -m unittest discover