Shadowmire syncs PyPI (or plain HTTP(S) PyPI mirrors using Shadowmire) with a lightweight and easy approach.
Requires Python 3.11+.
Bandersnatch is the recommended solution to sync from PyPI. However, it has these 2 issues that haven't been solved for a long time:
- Bandersnatch does not support removing packages that have been removed from upstream, making it easier to be the target of supply chain attack.
- The upstream must implement XML-RPC APIs, which is not acceptable for most mirror sites.
Shadowmire is a light solution to these issues.
PyPI's XML-RPC APIs have list_packages_with_serial()
method to list ALL packages with "serial" (you could consider it as a version integer that just increases every few moments). changelog_last_serial()
and changelog_since_serial()
are NOT used as they could not handle package deletion. Local packages not in the list result are removed.
Results from list_packages_with_serial()
are stored in remote.json
. local.db
is a sqlite database which just stores every local package name and its local serial. local.json
is dumped from local.db
for downstream cosumption.
Obviously, list_packages_with_serial()
's alternative is the local.json
, which could be easily served by any HTTP server. Don't use local.db
, as it could have consistency issues when shadowmire upstream is syncing.
Important
Shadowmire is still in experimental state. Please consider take a snapshot before using (if you're using ZFS/BtrFS), to avoid Shadowmire eating all you packages in accident.
If you just need to fetch all indexes (and then use a cache solution for packages):
./shadowmire.py --repo /path/to/pypi sync
If --repo
argument is not set, it defaults to current working directory.
If you need to download all packages, add --sync-packages
.
./shadowmire.py sync --sync-packages
Important
If you sync with indexes only first, --sync-packages
would NOT update packages which have been the latest versions. Use verify
command for this.
Sync command also supports --exclude
-- you could give multiple regexes like this:
./shadowmire.py sync --exclude package1 --exclude ^0
Also it supports prerelease filtering like this:
./shadowmire.py sync --sync-packages --prerelease-exclude '^duckdb$'
And --shadowmire-upstream
, if you don't want to sync from PyPI directly.
./shadowmire.py sync --shadowmire-upstream http://example.com/pypi/
If you already have a pypi repo, use genlocal
first to generate a local db:
./shadowmire.py genlocal
Important
You shall have file json/<package_name>
before genlocal
.
Verify command could be used if you believe that something is wrong (inconsistent). It would:
-
remove packages NOT in local db (skip by default, it would only print package names without
--remove-not-in-local
) -
remove packages NOT in remote (with consideration of
--exclude
) -
make sure all local indexes are valid, and (if --sync-packages) have valid local package files
(
--prerelease-exclude
would used only for packages that requires updating) -
delete unreferenced files in
packages
folder
./shadowmire.py verify --sync-packages
Verify command accepts same arguments as sync, and accepts some new arguments. Please check ./shadowmire.py verify --help
for more information.
If you don't like appending a long argument list, you could use --config
(example):
./shadowmire.py --config config.toml sync
Also, if you need debugging, you could use do-update
and do-remove
command to operate on a single package.
This project uses some code from PyPI's official mirroring tools, bandersnatch. It uses Academic Free License v3, and you could read its license contents here.
Suggested by LLM.
Sure, to capture the mysterious, fantastical, and intriguing nature of "Bandersnatch," here are some similar-style project name suggestions:
- Shadowmire:
- Meaning: A mysterious shadowy swamp, implying the unknown and exploration.