
Some scripts help train an semi-labelled dataset with mixmatch and then label unlabelled images

Primary LanguagePythonApache License 2.0Apache-2.0


Mixmatch scripts help train an semi-labelled dataset and then auto-label unlabelled images. You can also get an trained model.

MixMatch code: https://github.com/google-research/mixmatch

Code for the paper: "MixMatch - A Holistic Approach to Semi-Supervised Learning" by David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver and Colin Raffel.

Install dependencies

sudo git clone https://github.com/google-research/mixmatch.git
sudo apt install python3-dev python3-virtualenv python3-tk imagemagick
virtualenv -p python3 --system-site-packages env3
. env3/bin/activate
pip install -r mixmatch/requirements.txt
pip install opencv-python

Run and label image

1. Label 10% and organize dataset as below

    |-- cifar10
        |-- airplane
        |-- automobile
        |-- bird
        |-- cat
        |-- deer
        |-- dog
        |-- frog
        |-- horse
        |-- ship
        |-- truck
        |-- UNLABEL
        |-- NEGATIVE

cifar10 is dataset name. Unlabelled-images should be put in UNLABEL. Images that will not be trained should be put in NEGATIVE. Others folders are named known-label. Just labelling 10% of the dataset first can you run these scripts to auto-label the rest.

2. Git clone MixMatch and install dependencies

3. Run train_and_label_image.py

python3 train_and_label_image.py --dir=$DATASET_PATH$

$DATASET_PATH$ is path to your dataset.

train_and_label_image.py main process:

a] Make your dataset into tfrecord.

b] Train with tfrecord.

c] Label images in UNLABEL.

d] You can check right or not and move to labelled dir.

e] Rerun if UNLABEL is not empty.