Report paper on Overleef
Report presentation and paper
The purpose of the project is to create a visual system capable to detect and recognize the paintings exhibited in ”Gallerie Estensi” museum of Modena and locating people that are visiting the museum understanding the room in which they are. The system in the future can be used by a robot for visiting the museum in an autonomous manner or can be installed on a mobile application to improve and guide the visitor’s experience.
The main aspects of the project’s pipeline are presented in this section. The motivations behind the choices made and the details of each task are discussed in the project documentation. The solution proposed takes as input a single frame from a video and passes it to the following pipeline:
People and paintings are detected through two different approaches
- People: given the large amount of data available on the web people are detected with a deep learning approach. In particular a faster RCNN pretrained on COCO dataset is used. The model returns a list of detection with the corresponding labels and bounding boxes. From the 80 COCO classes only the class “Person” is taken into account and all the others are discarded.
- Paintings: the necessary amount of data to train a classifier is not available, therefore an image processing approach is used. It is based on two main considerations:
- The wall has a lighter color than the paintings;
- Almost all the paintings are rectangular and the most of the circular ones have a rectangular frame. Starting from this two considerations the model creates a mask filtering the pixels of the wall that have hue between 80 and 255. The contours found in the mask are approximated to polygons and only the ones with more than four side are taken into account.
The boxes containing the paintings and the people detected are compared together in order to discard boxes that are contained in other boxes. This control allows the model to:
- not consider people portrayed in the paintings;
- consider only the outermost border if more than one is found for a single painting.
Each painting detected and acknowledged by the inner box check is cropped from the frame with a padding and segmented using the Otsu threshold algorithm. The segmentation allows to detect the borders of the paintings with more precision and in particular to localize the four corner points that are crucial for the perspective rectification.
The four edge points B found with segmentation are used to compute the height H and the width W of the rectified version of the painting. Using W and H the model computes the new rectified box B0 with perpendicular corners. Then it warps the original frame using the transformation matrix obtained from the two set B and B0 and it crops the warped image in order to select just the painting region.
For retrieval the information of the detected painting are computed the ORB descriptors on the crop portion of the frame and they are compared with the ones of all the paintings stored in the database. The function returns a sorted list of the number of strong matches found for each painting. The retrieval is considered reliable if there are almost 20 strong matches.
The peopleLocalization combines the results of peopleDetection and paintingRetrieval. Whenever a person is detected the model is able to localize her retrieving the room from the database information of a painting recognized in the same frame of the person.
The program takes as input two parameters:
- root_path: the path of the google drive "Project Material" folder containing:
- /videos folder containing all the videos
- /paintings_db folder containing the painting database
- data.csv file with the informations of the painting database
- map.png image of the museum map
- model: the model to use for the people detection: there are actually two option available 'COCO' or 'PEDANT'