Team: Vivian Pazmany, Chenhui "Elvis" Zhu, Matthew Boyd
TA: Yicun (Ethan)
Updated: 12/13/2019
- Product Mission
- Customer & User Stories
- System Design
- Minimum Viable Product (MVP)
- Technology Selections
- Technology Justifications
- Competitors
- Competitors' User Stories
- Patent Analysis
- Poster
Use depth camera to measure object size and determine rough object shape, then use image recognition to match visible graphics with database of known products to identify retail products in a "cluttered" box of products.
CUSTOMER: Retail stores w/pick up/delivery services like Target, Walmart etc.
- I, the store manager, would like to improve the product sorting process to optimize the time of the human picked orders.
- I, a retail worker, would like to help customers while an automated system handles the back-room product sorting.
- I, the store manager, would like to be able to add new products easily for image recognition & sorting rather than training human workers about new products or bring in the vendor to image new products.
- I, the store manager, want to save worker time by them not having to orient items to show the bar code in the picked box of items, esp. for heavy objects.
- I, as supervisor, do not want to have to approve each image of the product that pops up. The autonomous system should detect and continuously match products after each product is removed from the pile.
Capability of recognizing the top object in a box of 5 known stacked objects of hard-edged dimensions (no bottles) using our stereo camera system and reference database.
Software
Language: Python, C++, Javascript (GUI)
Single Image feature extraction: SIFT
Image Serach: FLANN over stored list of extracted features
Using this in place of VDMS for feature matching
Feature Storage: Local binary files
Using this in place of VDMS for feature storage
Product Database: MySQL
Using this in place of VDMS for storing product information
GUI Application: TBD
Depth image metadata extraction (3D): Intel RealSense SDK
Hardware
Lighting: Matt's LED's
Image Capture: Intel RealSense D415 camera module
Laptop: GUI, Image Capture, Image+Feature Storage & Search, Product Database
Camera Mount: metal shelf
As a product intended for use in a retail environment, where 30K-100K individual product SKUs are easily possible, training an ML model to detect a product would require tremendous effort. Not only would we need multiple tagged images of each product, but the matching would be extremely slow. Typical "state of the art" ML today can only match a few dozen items with any given model. We are looking to match tens of thousands of products. ML seems a poor fit given our very controlled environment.
- All libraries and tools selected support both languages
- Our team has expertise in both
- Python is good for development and testing, but slower
- C++ is faster, but more cumbersome and slower to code
Comparison of SURF, SIFT, and ORB:
- Number of feature points detected: SURF > ORB > SIFT
- Detection time: ORB > SURF > SIFT
- Scaling: SIFT > SURF > ORB
- Rotation: SIFT > ORB/SURF
All in all, the SIFT is the best algorithm for our project. This is because that the most important part is attached to the performance when deal with the scaled or rotated images rather than the detection speed.
- Supports saving indexes to disk - allows us to continue to segment image database by object size.
- After indexes are built, searching is very quick (<100 ms).
- Free and easy to use.
- Matt has expertise with RDBMSes
- Generates depth images directly.
- Relatively inexpensive.
- Widely used, so lots of tutorials and community support.
- Works with Intel RealSense camera module.
- Easy support for generating point clouds, measuring objects, etc.
- Widely used, so lots of tutorials and community support.
Ties all systems together using: -NodeJS -Socket.IO -Bootstrap
Kwikee (kwikee.com)
- Retail product imaging for ecommerce
- Database of product attributes
Trax (traxretail.com)
- Shelf/product imaging for retail inventory management
Aifi (aifi.io)
- Auto checkout system (cashierless stores)
Cashierless store tech in general...
- Aifi, Standard Cognition, Zippin, Trigo Vision, etc.
Simple Robotics
- Inventory robot (on-shelf inventory)
Trax:
- I, as retailer, improve the operational efficiency by the leveraging real-time shelf data. Trax helps retailer to understand current levels of out of stocks for core products.
Kwikee:
- For retailer, they can use kwikee to improve their products’ brand stories, manage the products images they need.
AIFI:
- I, as retailers, can easily track customers, inventory and behavior to get real-time insights into what's working and what isn't.
- I, as customer, can fill their carts and maybe even putting items back where they don't belong.
-
Automatic sorting machine for sorting and classifying small products of the pharmaceutical and confectionery industries according to form and color: https://patents.google.com/patent/US5558231A/en
-
Facial recognition patents:
- Apple: https://www.patentlyapple.com/patently-apple/2011/11/apple-wins-secret-patent-for-high-end-3d-object-recognition.html
- "A method, device, system, and computer program for object recognition of a 3D object of a certain object class using a statistical shape model for recovering 3D shapes from a 2D representation": https://patents.google.com/patent/US20120114251A1/en
-
3D object recognition using a single camera image to identify the object through comparison with the camera coordinate system: https://patents.google.com/patent/US8379014B2/en
-
Object recognition system using position sensor, image sensor, and controller: https://patents.google.com/patent/US6792147
-
Vehicle object recognition system: "A method and system for detecting and tracking objects near a vehicle using a three dimensional laser rangefinder" : https://patents.google.com/patent/US8260539
-
US gov. Vehicle object recognition patent: "The platform provides an automated tool that can integrate multi-modal sensor data including two-dimensional image data, three-dimensional image data, and motion, location, or orientation data, and create a visual representation of the integrated sensor data, in a live operational environment.": https://patents.justia.com/patent/9911340