pdscan
Scan your data stores for unencrypted personal data (PII)
- Last names
- Email addresses
- IP addresses
- Street addresses (US)
- Phone numbers (US)
- Credit card numbers
- Social security numbers
- Dates of birth
- Location data
- OAuth tokens
Uses data sampling and naming, and works with compressed files
đź’Ą Zero runtime dependencies and minimal database load
Installation
Download the latest version.
Unzip and follow the instructions below for your data store.
On Mac, you can also use:
brew install ankane/brew/pdscan
Data Stores
Files
pdscan file://path/to/file.txt
You can also specify a directory.
pdscan file://path/to/directory
For absolute paths, use file:///
.
MySQL & MariaDB
pdscan mysql://user:pass@host:3306/dbname
Postgres
pdscan postgres://user:pass@host:5432/dbname
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
SQLite
pdscan sqlite:/path/to/dbname.sqlite3
S3
pdscan s3://bucket/path/to/file.txt
Requires
s3:GetObject
permission
You can also specify a prefix by ending with a /
.
pdscan s3://bucket/path/to/directory/
Requires
s3:ListBucket
ands3:GetObject
permissions
Others
Feel free to submit a PR
Options
Show data found
pdscan --show-data
Show low confidence matches
pdscan --show-all
Change sample size
pdscan --sample-size 50000
Specify number of processes to use (defaults to 1)
pdscan --processes 4
Roadmap
- Add more data stores (SQL Server, MongoDB, Elasticsearch, Memcached, Redis)
- Improve rules
- Highlight matches
- Add more output formats, like JSON and CSV
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test