ImageGen

The human cognitive-visual system is capable of abstracting visual concepts from multiple elements in a scene. For instance, it is possible to categorise a photo with people in the theme work or vacation according to visual attributes such as clothing and objects in the scene. From the point of view of computer vision and pattern recognition, these representations could be identified as the same category. Therefore, abstract visual characteristics commonly extracted by computational vision methods are usually insufficient, being necessary complement with semantic information. In this project semantic information complimentary to abstract visual characteristics will be investigated. In particular, characteristics obtained by convolution neural networks will be used as abstract visual representations, complemented by categorical textual information from object recognition methods or annotations. As result, first we aim to understand improvement in representation when characteristics are combined, and in second place how to extend the methods for translating visual characteristics in textual characteristics and vice versa. Possible applications include scene description, visual sub-categories detection, anomaly detection and others.

Final report on the project available in portuguese.

To run the code the following files are needed:

CIFAR-10 dataset
Imagenet dataset (ajust path variable)
yolo_model.h5

jumc/ImageGen

ImageGen