An interactive and zoomable visualization of your whole dataset. This web-based tool, a modified version of the original Pixplot, is valuable for object detection and classification projects to perform these tasks:
- Initial investigation and visualization of a labelled (or unlabelled) dataset.
- Fixing incorrect classifications and removing invalid or confusing images. (Click on an image and update its label, or flag for removal)
- Visualizing false positive bounding boxes to identify why they are occuring.
Images that look similar are located next to or near each other, making it easy to see where errors occur (in the UMap visualization).
UMap Visualization | Interactive and Zoomable | Different Views (by label) |
---|---|---|
The repo contains:
-
A PixplotML server. We have added tools helpful for labelling such as a legend, border colours representing the label, and functionality to update labels or flag images for removal. The original PixPlot uses a classification model trained on ImageNet but we find fine-tuning on your own data produces much more accurate visualizations. So we have added:
-
A preparation step to customise the visualization to your image data. The preparation step uses your images and a metadata.csv file to train a PyTorch classification model and then output an image vectors file for clustering by PixplotML (using UMap). See Fine-tune PixplotML for your own images for more details. The code to do this is in the prep_pixplot_files folder. (We use Pytorch-Accelerated to easily and simply train a classification model)
The server requires the following files located in a folder:
metadata.csv - a file containing the image name, category (see below for more details).
images/*.* - a sub folder containing the images to visualize
image_vectors.npy - image vectors from a classification model backbone. (See below for more details)
To create the image_vectors.npy for your images we provide code and instructions, see Fine-tune PixplotML for your own images for more details. The code to do this is in the prep_pixplot_files folder.
To quickly see PixplotML running on bounding box images extracted from the Coco dataset, you can follow this pre-created example which contains all the required files.
First, clone the repo and extract the zip file containing coco validation dataset bounding boxes
git clone https://github.com/alexhock/pixplotml.git
cd pixplotml
unzip ./data/coco_trained.zip -d ./data/
To run pixplotml, there are two options: using Python with a new environment, or using Docker where the environment is managed for you.
-
Create a Python environment and install dependencies:
conda create --name=pixplotml python=3.9 conda activate pixplotml cd pixplot_server pip install -r requirements.txt
-
Run the pixplot pre-processing. This prepares images and creates the pixplot website in a folder called 'output':
python pixplot/pixplot.py --images "../data/outputs/images/*.jpg" --metadata "../data/outputs/metadata.csv" --image_vectors "../data/outputs/image_vectors.npy"
-
Start a web server by running:
python -m http.server 8600
Open a browser to:
http://localhost:8600/output
.
Instead of manually creating a Python environment and performing the steps in the Python quickstart we can instead just use docker to take care of all that.
-
Build and tag the image:
cd pixplot_server docker build -t pixplotml:1.0 .
-
Run pixplot
cd data docker run -v `pwd`/outputs:/data -p 8800:8800 pixplotml:1.0 /data 8800 metadata.csv images/*.jpg
Open a browser to:
http://localhost:8800/output
To stop the running docker container:
export CONTAINER_ID=`docker ps -lq`
docker stop $CONTAINER_ID
Note that if you want to avoid re-running the preprocessing step then you must commit the docker image after the first run.
docker ps -a
docker commit <container_id> pixplotml:2.0
Then to run use the new docker image name pixplotml:2.0
Metadata should be in a comma-separated value file, should contain one row for each input image, and should contain headers specifying the column order. Here is a sample metadata file:
filename | category | tags | description | permalink |
---|---|---|---|---|
bees.jpg | yellow | a|b|c | bees' knees | https://... |
cats.jpg | dangerous | b|c|d | cats' pajamas | https://... |
The following column labels are accepted:
Column | Description |
---|---|
filename | the filename of the image |
category | a categorical label for the image |
tags | a pipe-delimited list of categorical tags for the image |
description | a plaintext description of the image's contents |
permalink | a link to the image hosted on another domain |
year | a year timestamp for the image (should be an integer) |
label | a categorical label used for supervised UMAP projection |
lat | the latitudinal position of the image |
lng | the longitudinal position of the image |