LoRA Dataset Webui

This project aims to help with the creation and management of LoRa training datasets. Scroll down to the bottom of the page for a feature overview.

This is still in beta - please report any bugs you find Pull requests are welcome. Currently everything is just cobbled together

Roadmap:

Known issues:

No files/folders are ever deleted, leading to clutter/orphaned images
Deleting an enite folder can break the step
When an image is open and you save the dataset, it will throw an error and half of the files are left where they are.

Getting started

(optional) create a venv first:

python -m venv venv
venv\Scripts\activate

install the requirements:

pip install -r requirements.txt

start either by running start.bat or manually using:

python webserver.py

(see python webserver.py --help for launch arguments)

Access the webui on the following URL: http://127.0.0.1:8080/

If tagging/cropping/etc is too slow, try run pip install onnxruntime-gpu, but keep in mind that this will use some of your VRAM.

Running this script is recommended to get all features of the webui.

using start.bat already downloads all dependencies by default

It will gives you the option to download the following files:

danbooru-tags.json and gelbooru-tags.json from github gist or catbox.moe.
- You also have the option to scrape the tags from the site directly.
cropper.js and cropper.css from Cloudflare/cdnjs.

Clear your browser cache between updates. It tends to leave the old scripts/css loaded

The folders created are meant to be used as follows:

0 - raw - raw images from the internet / screenshots
1 - cropped - cropped images (1:1 aspect ratio)
2 - sorted - images grouped by quality / topic / etc
3 - tagged - .txt or .json files containing autotagger output
4 - fixed - pruned tags in .txt format.
5 - out - scaled down images and pruned tags - point your training script here
datasets - all your datasets are saved here

Some of these images/videos might be outdated. There's also in UI tooltips. If something breaks just open an issue here on Github.

Save / load datasets you're working on
Avoid having to change training folder, just point your training script at the 5 - out folder and load the right dataset
Write notes for yourself