YoloParser

Architecture logic

  1. The architecture logic begins with the three different cameras that are used for the solution. The algorithm that manipulates what is happening based on the cameras is explained in the backend, but to make it clear, one of these cameras is the one that records a view from the top, another records a view from the front of the shelf, and another records the back of the person who is purchasing the product. We have not yet integrated this third into our algorithm, because we have implemented a simpler solution where there is only one buyer at a time.

  2. The images generated by the cameras are transmitted to YOLO artificial intelligence. Yolo processes the videos in real time, and detects the current status of the products, as previously trained. The top camera is used to count the products without differentiating each one in specific, while the front camera performs the identification of the type, brand, of each product.

  3. Each of the YOLOs will process the videos and then give an answer, an output, of what it has identified, through JSON objects. So, at each instant it generates a JSON output. For each YOLO, there is a related parser, written in JAVA. Each of these parsers will process the outputs from YOLO and then apply a filter that discards changes that are insignificant, for example when YOLO discerns a product in one frame and soon in the other frame it already realizes again, indicating only a small flaw in the product device which is common to happen. Thus, the parser do the parsing of the output from YOLO, performs the filtering, and finally sends the status to the event identifier, through a REST API, explained in the backend.

Darknet docs

Inside this repo you can find darknet_superfuturo folder. There is a compiled, configured, gpu (CUDA) enabled darknet and its weights, as well as some training images we used. There you will also find scripts that starts the parser for each view. For more information, check the darknet_superfuturo doc.