/pypi-pgp-statistics

Scripts and data from analyzing PGP signatures uploaded to PyPI

Primary LanguagePython

pypi-pgp-statistics

NOTE: The data in this repository comes from PyPI's BigQuery dataset, and was generated on 2023-05-19. See Updating the files for steps for rebuilding it.

Updating the files

NOTE: These steps are provided on a best-effort basis. They may become outdated or broken over time.

Setup

Create a virtual environment with the dependencies needed:

python -m venv --upgrade-deps env
./env/bin/python -m pip install -r requirements.txt -r dev-requirements.txt

inputs/dists-with-signatures.csv

Run the following BigQuery query (tweak the timestamp, if you'd like):

SELECT name, version, filename, python_version, blake2_256_digest
FROM `bigquery-public-data.pypi.distribution_metadata`
WHERE has_signature
AND upload_time > TIMESTAMP("2020-03-27 00:00:00")

outputs/dists-by-keyid.json

./env/bin/python dists-by-keyid.py \
    < inputs/dists-with-signatures.csv \
    > outputs/dists-by-keyid.json

outputs/all-dist-keys.jsonl

./env/bin/python all-dist-keys.py \
    < outputs/dists-by-keyid.json \
    > outputs/all-dist-keys.jsonl

outputs/key-audit.jsonl

./env/bin/python key-audit.py \
    < outputs/all-dist-keys.jsonl \
    > outputs/key-audit.json