/pdscan

Scan your data stores for unencrypted personal data (PII)

Primary LanguageGoMIT LicenseMIT

pdscan

Scan your data stores for unencrypted personal data (PII)

  • Last names (US)
  • Email addresses
  • IP addresses (IPv4)
  • Street addresses (US)
  • Phone numbers
  • Credit card numbers
  • Social Security numbers (US)
  • Dates of birth
  • Location data
  • OAuth tokens
  • MAC addresses

Uses data sampling and naming, and works with compressed files

đź’Ą Zero runtime dependencies and minimal database load

Build Status

Installation

Download the latest version:

You can also install it with Homebrew or Docker.

Data Stores

Elasticsearch

pdscan elasticsearch+http://user:pass@host:9200

For HTTPS, use elasticsearch+https://.

You can also specify indices.

pdscan elasticsearch+http://user:pass@host:9200/index1,index2

Wildcards are also supported.

pdscan "elasticsearch+http://user:pass@host:9200/index*"

Files

pdscan file://path/to/file.txt

You can also specify a directory.

pdscan file://path/to/directory

For absolute paths, use file:///.

pdscan file:///absolute/path/to/file.txt

For paths relative to your home directory on Mac and Linux, use:

pdscan file://$HOME/file.txt

MariaDB

pdscan mariadb://user:pass@host:3306/dbname

MongoDB

pdscan mongodb://user:pass@host:27017/dbname

MySQL

pdscan mysql://user:pass@host:3306/dbname

OpenSearch

pdscan opensearch+http://user:pass@host:9200

For HTTPS, use opensearch+https://.

You can also specify indices.

pdscan opensearch+http://user:pass@host:9200/index1,index2

Wildcards are also supported.

pdscan "opensearch+http://user:pass@host:9200/index*"

Postgres

pdscan postgres://user:pass@host:5432/dbname

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

If your connection doesn’t use SSL, append to the URI:

?sslmode=disable

For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).

CREATE EXTENSION tsm_system_rows;

Redis

pdscan redis://user:pass@host:6379/db

S3

pdscan s3://bucket/path/to/file.txt

Requires s3:GetObject permission

You can also specify a prefix by ending with a /.

pdscan s3://bucket/path/to/directory/

Requires s3:ListBucket and s3:GetObject permissions

SQLite

pdscan sqlite://path/to/dbname.sqlite3

Not available with prebuilt binaries

SQL Server

pdscan "sqlserver://user:pass@host:1433?database=dbname"

Options

Show the data found

pdscan --show-data

Show low confidence matches

pdscan --show-all

Change the sample size

pdscan --sample-size 50000

Specify the number of processes to use (defaults to 1)

pdscan --processes 4

Scan for only certain types of data

pdscan --only email,phone,location

Scan for all except certain types of data

pdscan --except ip,mac

Specify the minimum number of rows/documents/lines for a match (experimental)

pdscan --min-count 10

Specify a custom pattern (experimental)

pdscan --pattern "\d{16}"

Output newline delimited JSON (experimental)

pdscan --format ndjson

Additional Installation Methods

Homebrew

With Homebrew, you can use:

brew install ankane/brew/pdscan

Docker

Get the Docker image with:

docker pull ankane/pdscan

And run it with:

docker run -ti ankane/pdscan <connection-uri>

For data stores on the host machine, use host.docker.internal as the hostname

docker run -ti ankane/pdscan "postgres://user@host.docker.internal:5432/dbname?sslmode=disable"

On Linux, this requires Docker 20.04+ and --add-host=host.docker.internal:host-gateway

For files on the host machine, use:

docker run -ti -v /path/to/files:/data ankane/pdscan file:///data

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pdscan.git
cd pdscan
make test