Downloader for images referenced by LAION parquet files
- Python 3.10+
pipenv install
Download parquet file with 400 million references to (probably copyrighted!) images:
wget -l1 -r --no-parent https://the-eye.eu/public/AI/cah/laion400m-met-release/laion400m-meta/ -P data --cut-dirs 6
There are even datasets with 2 to 5 billion references.
For instance, download all images tagged with the words “pumpkin” and “halloween” and store them along with the retrieved metadata in a new parquet file named “halloween-pumpkin-400m.parquet”:
pipenv run ./dl.py \
the-eye.eu/*.parquet \
--keywords pumpkin,halloween \
--output halloween-pumpkin-400m.parquet
Generate an HTML file with a catalog of all or a selection of the downloaded images contained in the parquet file:
pipenv run ./mkcatalog.py \
halloween-pumpkin-400m.parquet \
--output pumpkins.html
See LICENSE