/yolo-splitter

Tool that makes it easy to split images and their associated labels into separate sets for training and testing.

Primary LanguagePythonMIT LicenseMIT

yolo-splitter

Tool to create,modify YOLO dataset.

Installation

pip install yolosplitter

Uses

from yolosplitter import YoloSplitter

ys = YoloSplitter(imgFormat=['.jpg', '.jpeg', '.png'], labelFormat=['.txt'] )

# If you have yolo-format dataset already on the system
df = ys.from_yolo_dir(input_dir="yolo_dataset",ratio=(0.7,0.2,0.1),return_df=True)

# If you have mixed Images and Labels in the same directory
df = ys.from_mixed_dir(input_dir="mydataset",ratio=(0.7,0.2,0.1),return_df=True)

# To see train/test/val split size, total error files, all class names from annotation files
ys.info()

# !!! changed show_dataframe to get_dataframe()
# to see dataframe
ys.get_dataframe()

2024-01-30_08-19

ys.save_split(output_dir="potholes")
Saving New split in 'potholes' dir
100%|██████████| 118/118 [00:00<00:00, 1352.79it/s]
ys.info()
# output
{'train': 122, 'val': 35, 'test': 17, 'cls_names': {0, 1}, 'errors': 0}
# Use ys.show_show_errors  to show filename which have errors
ys.show_errors()

# Use ys.show_dataframe to see dataframe created on the dataset
ys.get_dataframe()

# To see train/test/val split size, total error files, all class names from annotation files
ys.info()

Input Directory

MyDataset/
├── 02.png
├── 02.txt
├── 03.png
├── 03.txt
├── 04.png
├── 04.txt
├── 05.png
├── 05.txt
├── 06.png
├── 06.txt
├── 07.png
├── 07.txt
├── 08.png
├── 08.txt
├── 09.png
├── 09.txt
├── 10.png
├── 10.txt
├── 11.png
└── 11.txt

Output Directory

MyDataset-splitted/
├── data.yaml
├── train
│   ├── images
│   │   ├── 03.png
│   │   ├── 04.png
│   │   ├── 05.png
│   │   ├── 07.png
│   │   ├── 08.png
│   │   ├── 09.png
│   │   └── 10.png
│   └── labels
│       ├── 03.txt
│       ├── 04.txt
│       ├── 05.txt
│       ├── 07.txt
│       ├── 08.txt
│       ├── 09.txt
│       └── 10.txt
└── val
    ├── images
    │   ├── 02.png
    │   ├── 06.png
    │   └── 11.png
    └── labels
        ├── 02.txt
        ├── 06.txt
        └── 11.txt

Change Log

Stable

  • 2023-01-30 version 4.9

  • 2023-12-20 version 4.8

    • Changed yaml file style
  • 2023-12-19 version 4.7

    • Fix output dir of val to valid thanks to [https://github.com/AndreasFridh]
    • Added ys.info() To see train/test/val split size, total error files, all class names from annotation files
    • Changed ys.show_dataframe to ys.get_dataframe()
    • small bug fixes