/waste-datasets-review

List of image datasets with any kind of litter, garbage, waste and trash

Waste datasets review

List of datasets with any kind of litter, garbage, waste and trash. Created during the detectwaste.ml project

Today, more than 300 million tons of plastic are produced annually. Plastic is everywhere and we constantly use it in our daily life.

The idea of detect waste project is to use Artificial Intelligence to detect plastic waste in the environment. Our solution will be applicable for video and photography. Our goal is to use AI for Good.

Visit majsylw/litter-detection-review to see broader review of papers, projects and other resources concering the problem of litter in an environment.

Contributing

Feel free to add issue with short description of new dataset or create a pull request - add the new dataset to the table or fill missing description.

Summary

Name No. categories No. subcategories No. images Annotation Comment Website Description
TrashCan 1.0 3 34 7 212 Instance-Segmentation Underwater images website ✔️
Trash-ICRA19 3 34 5 700 Detection Underwater images website ✔️
TACO 28 60 1 500 Segmentation Waste in the wild website ✔️
TACO bboxes 7 60 WIP Detection Waste in the wild WIP ✔️
UAVVaste 1 - 772 Segmentation Drone dataset github ✔️
Trashnet 6 - 2 527 Classification Clear background github ✔️
WaDaBa 8 color,size, shape, or material 4 000 Classification Plastic dataset, clear background website ✔️
GLASSENSE-VISION 7 136 2 000 Classification Home-supplies, clear background website ✔️
Waste Classification data 2 - ~25 000 Classification Scraped from google search kaggle ✔️
Waste Classification Data v2 3 - ~27 500 Classification Scraped from google search kaggle ✔️
Waste Images from Sushi Restaurant 16 - 500 Classification Clear background kaggle ✔️
Open litter map 11 187 > 100k Multilabel classification Waste in the wild website ✔️
Litter 24 size, shape, or material ~14 000 Detection Waste in the wild, paid license website ✔️
Drinking Waste Classification 4 - 9640 Detection Clear background, (cans and bottles) kaggle ✔️
waste_pictures 34 - ~24 000 Classification Scraped from google search kaggle ✔️
spotgarbage 3 - ~2 400 Classification Scraped from Bing search kaggle
github
✔️
DeepSeaWaste 5 - 3 055 Classification Underwater images kaggle ✔️
MJU-Waste v1.0 1 - 2475 Segmentation Plain background, indoor RGBD images github ✔️
Domestic Trash Dataset 10 - > 9000 Classification/Detection Waste inn the wild, paid license, 250 images for free github ✔️
Cigarette butt dataset 1 - 2200 Detection Waste inn the wild, synthetic images website ✔️
TrashBox 7 25 17785 Classification/Detection Scraped from web github ✔️

Description

TrashCan 1.0

An Instance-Segmentation Labeled Dataset of Trash Observations

7212 images under 3 main categories: bio, trash, unknown. Categories:

  • bio = turtle, squid, lobster, unknown, jellyfish, stingray, shrimp, crawfish, octopus, shark, shell, crab, starfish, eel
  • trash = clothing, pipe, bottle, bag, snack_wrapper, glove, tire, can, cup,container, branch, wreakage, tarp, box, hose, rope, hay, net, paper, bucket, wire
  • unknown Download: Directly from website https://conservancy.umn.edu/handle/11299/214865

Trash-ICRA19:

A Bounding Box Labeled Dataset of Underwater Tras 5,700 underwater images extracted from video https://jungseokhong.github.io/

Download: Directly from website https://conservancy.umn.edu/handle/11299/214366

TACO

Open dataset with 1500 images from 28 categories and 60 detailed sub-categories of waste in the wild. Annotations available in COCO-json.

Download: Directly from website http://tacodataset.org/

TACO bboxes

Additional hand-labelled annotations for images from TACO dataset. There are seven recognized waste categories:

  • bio: food waste such as fruit, vegetables, herbs, used paper towels and tissues,
  • glass: glass objects such as glass bottles, jars, cosmetics packaging,
  • metals and plastic: scrap metal and non-ferrous metal, beverage cans, plastic beverage bottles, plastic shards, plastic food packaging, or plastic straws,
  • non-recyclable: residual rubbish such as disposable diapers, pieces of string, polystyrene packaging, polystyrene elements, blankets, clothing, or used paper cups,
  • other: construction and demolition, large-size waste (e.g. tires), used electronics and household appliances, batteries, paint and varnish cans, or expired medicines,
  • paper: paper, cardboard packaging, receipts, newspapers, catalogues, and books,
  • unknown waste: (highly decomposed and hard-to-recognize litter),
  • and extra class background label without any litter: a sidewalk, a forest path, a lawn

Read more about it in the paper Deep learning-based waste detection in natural and urban environments,.

Download: Directly from detect waste repository

UAVVaste

Drone rubbish detection intelligent technology The UAVVaste dataset consists to date of 772 images and 3716 annotations. The main motivation for creation of the dataset was the lack of domain-specific data. The datasets that are widely used for object detection evaluation benchmarking. The dataset is made publicly available and is intended to be expanded.

Avaiable annotations for Detection and Segmentation https://github.com/UAVVaste/UAVVaste

Download: Directly from annotations json on github https://github.com/UAVVaste/UAVVaste

Trashnet

The dataset spans six classes: glass, paper, cardboard, plastic, metal, and trash. Currently, the dataset consists of 2527 images:

  • 501 glass
  • 594 paper
  • 403 cardboard
  • 482 plastic
  • 410 metal
  • 137 trash

Download: Directly from github https://github.com/garythung/trashnet

also is known as Garbage Classification Data

The Garbage Classification Dataset contains 2467 images from 6 categories: cardboard (393), glass (491), metal (400), paper(584), plastic (472) and trash(127).

Download: Directly from kaggle https://www.kaggle.com/asdasdasasdas/garbage-classification

Plastic Waste DataBase of Images – WaDaBa

4000 images with detailed description of a plastic type (PET, PP, PE-HD...), object color, deformation level, dirtiness and others. [classification]

The object were put on the research position and next photographed with first and second type of light. There were series carried out of 10 photographs with differ in the angle of the turnover for every object (in the vertical axis). Next the object was damaged to varying degrees: small, medium and large. For each type of destruction have been made 10 photographs. So considering all variants for every object 40 photographs were taken, multiplying it by the number of objects, 4 000 of photographs were created in the database.

Download: Images free-to-download directly from website. Annotations available after signing license http://wadaba.pcz.pl/#download

GLASSENSE-VISION

Home-supplies classification. It is not strict litter dataset but it gathers over 2000 images with objects well-spareted from background. Covers 7 main categories of (Banknotes, Cereals, Medicines, Cans, Tomato sauces, Water bottle, Deodorant stick) and 136 subcategories.

Glassense-Vision is a set of data we acquired and annotated to the purpose of providing a quantitative and repeatable assessment of the proposed method. The dataset includes 7 different use cases, meaning different object categories, where for each one of them we provide training (reference images used also to build dictionaries) and test images. All images in the dataset are manually annotated. The different use cases (object categories) can be grouped in three main geometrical types:

Download: http://www.slipguru.unige.it/Data/glassense_vision/

Waste Classification data

Over 25k images already divided into training data - 22564 images and test data - 2513 images. Two main categories: Organic and recyclable

Download: Directly from kaggle https://www.kaggle.com/techsash/waste-classification-data

Waste Classification Data v2

A variation about the Waste Classification data: extended by the new category "N" - Nonrecyclable added.

Over 25k images already divided into training data - 22564 + 2508 (N) images and test data - 2513 images + new 397 from category nonrecyclable. Three main categories: Organic (O) and recyclable (R), and nonrecyclable (N). TRAIN folder contains 2508 images in the "N" directory. The TEST folder contains 397 images in the "N" directory.

Download: Directly from kaggle https://www.kaggle.com/sapal6/waste-classification-data-v2

Open litter map

The biggest dataset with over 100k images in total with 11 main categories and 187 subcategories.[multilabel] [classification] https://openlittermap.com/

Download: Only from json with scraper - detectwaste scraper

Litter

The Litter dataset contains 14k images with 20k annotations (bounding boxes) and 24 classes. Each class represents an object (cup), while subclasses determine its size, shape, or material (long paper cup/short paper cup).

Download: After buying a license https://www.imageannotation.ai/litter-dataset

Drinking Waste Classification

The dataset contains ~10k images grupped by 4 classes of drinking waste: Aluminium Cans, Glass bottles, PET (plastic) bottles and HDPE (plastic) Milk bottles. Pictures were taken with 12 MP phone camera as a part of final year Individual Project at University College London. The dataset used parts of manually collected images from TrashNet.

Download: Directly from kaggle https://www.kaggle.com/arkadiyhacks/drinking-waste-classification

waste_pictures

The dataset contains ~24k images grupped by 34 classes of waste for classification purposes. The images were divided into train and test subsets.

Download: Directly from kaggle https://www.kaggle.com/wangziang/waste-pictures

spotgarbage - GINI dataset

The Garbage in Images (GINI) dataset with 2561 images with unspecified resolution, 1496 images were annotated by bounding boxes (one class - trash). Bing Image Search API was used to create their dataset.

Download: Directly from github https://github.com/spotgarbage/spotgarbage-GINI

DeepSeaWaste

This dataset consists of ~3k images divided by 4 categories, and taken under water. In csv file annotations were provided as:

  • source url of picture,
  • waste category,
  • date of taking the picture,
  • the place and depth at which the waste was found,
  • information whether it contains living organisms and sediments stuff,
  • information if this is some plastic bag.

Download: Directly from kaggle https://www.kaggle.com/henryhaefliger/deepseawaste

MJU-Waste v1.0

This dataset was created by capture collected waste items from a university campus in a lab background (people hold waste items in their hands). All images in the dataset are captured using a Microsoft Kinect RGBD camera. All annotations are provided in PASCAL VOC and COCO format.

MJU-Waste v1, contains 2475 co-registered RGB and depth image pairs. Images are randomly splited into a training set, a validation set and a test set of 1485, 248 and 742 images, respectively. Authors used single class label for all waste objects.

Download: From Google Drive link placed on https://github.com/realwecan/mju-waste/

Domestic Trash Dataset

Domestic Trash Dataset consists of images of domestic common trash objects. Images were captured and crowdsourced under wide variety of lighting conditions, weather, indoor and outdoor. This dataset can be used for make trash/litter detection models, eco-friendly alternative suggestions, carbon footprint generation etc.

Dataset Features

  • Various trash object classes
  • Has material labels
  • Captured by 5000+ unique users
  • Highly diverse and HD
  • Various lighting conditions
  • Indoor and Outdoor scenes

Dataset Format

  • Classification and detection annotations available
  • COCO, PASCAL VOC and YOLO formats
  • Approx. 9000+ unique images and growing
  • Only 250 images for free avaiable on kaggle

Download Images available for download after buying a license. Contact them from their support details at: https://github.com/datacluster-labs/Datacluster-Datasets

Cigarette butt dataset

This dataset consists of a set of 2200 synthetically composed images of cigarettes on the ground. It is designed for training CNNs (convolutional neural networks). You must read and accept the terms of the Non-Commercial, Educational License Agreement to download and use its content.

Dataset Features

  • Annotations: Segmented, object-detection COCO format with custom categories.
  • Composition: Images were composed automatically with custom code utilizing the Python Imaging Library to apply random scale, rotation, brightness, etc to the foreground cutouts
  • Location: Photos of the ground and cigarette butts were taken in Austin, Texas
  • Camera: iPhone 8, original pixel resolution 3024 x 4032

Download Images available for download after accepting the terms of the Non-Commercial, Educational License Agreement at: https://www.immersivelimit.com/datasets/cigarette-butts

TrashBox dataset

Dataset of trash objects for waste classification and detection (no detection annotations provided in repository). Contains 17785 waste object images scraped from web.

Waste categories are as follows:

  1. Medical waste : Syringes, Surgical Gloves, Surgical Masks, Medicines( Drugs and Pills) [Number of images: 2010]
  2. E-Waste : Electronic chips, Laptops and Smartphones, Applicances, Electric wires, cords and cables [Number of images: 2883]
  3. Plastic : Bags, Bottles, Containers, Cups, Cigarette Butts (which have a plastic filter) [Number of images: 2669]
  4. Paper : Tetra Pak, News Papers, Paper Cups, Paper Tissues [Number of images: 2695]
  5. Metal : Beverage Cans, Cnostruction Scrap, Spray Cans, Food Grade Cans, Other metal objects. [Number of images: 2586]
  6. Glass [Number of images: 2528]
  7. Cardboard [Number of images: 2414]

Download Images are available for download at github repository: nikhilvenkatkumsetty/TrashBox