Password dump database normalizer and seeder
super alpha
Depending on the data drive, add one of the conf files from the db directory to
Postgres' conf.d dir.
cp db/16gb_4cpu_ssd.conf /etc/postgres/10/main/conf.d/dump.conf
systemctl restart postgres@10-main.service
Currently averaging around 350K inserts/minute with these settings and table configuration in db/migrate/
There's a test tar in the tests dir
Dump entries should be in the format:
email@domain.com:password
The parse logic in seeder takes a best-effort approach to pulling domain, username, and
password from each line. Some dumps use ; and seeder looks for that too. Apart from that, if
it can't find all three datapoints in the line, it isn't added to the database.
# for psql
export RACK_ENV=production
# for sqlite
export RACK_ENV=development
bundle install
bundle exec rake db:reset
#build golang seeder
cd seeder && go build -o seeder main.go
# pushover.net token for mobile progress alerts
export PO_USR=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export PO_API=yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
export PG_CONN='postgres://passdb_user:passdb_pass@localhost/passdb'
#macos postgres want this string instead
export PG_CONN='postgres://passdb_user:passdb_pass@localhost/passdb?ssl_mode=disabled'
# seed the
./seeder test_data.tar.gz
Associations are set in the ORM such that pivotting on any of username, password, or domain
is possible
# to start the query interface
bundle exec rake
# start with a domain
yahoo = Domain.find_by(domain: "yahoo.com")
# find all passwords by yahoo mail users
yahoo.passwords
# find all yahoo mail users
yahoo.usernames
# find all password of a particular yahoo mail user
yahoo.usernames.first.passwords
# start with a user
eric = Usernames.find_by(name: "eric1990")
# see all passwords belonging to eric
eric.passwords
# see all email account for eric
eric.domains
# starting with a password
pass = Password.find_by(password: "P@ssw0rd!")
# see the users that share this password
pass.usernames
Run rake -T to see all tasks.
At the time of writing you can pull table sizes, current connection pool utilization

