
Classification and textual description generation of object-based images.

Scenes Interpreter Tool

An application to generate textual description (in English) which forms an image interpretation, based on its objects. This is done by a succession of multiple steps, based on machine learning models trained on various datasets: Classification (with a novel method: Scenes enrichment), visual relationships identification and definition, and description generation with Natual Language Processing.

General Info

Master's degree graduation project in artificial intelligence, realized from February to September 2020. The goal was to initiate in scientific research by proposing an approach to interpret scenes by exploiting their high features: Objects contained in the scenes. For that, various approaches were exploited to realize this task. Definitive proposed approach consists of three major steps: Scene classification (to standard class, with a novel method: Scenes enrichment, and classification to additional categories), identification and definition of visual relationships (the most important ones, between the scene objects) and the description generation (with a relationships graph and NLP).


Scene interpretations

Some correct scene interpretations by the proposed approach. For more examples, see Appendix B (page 122) in the Master's thesis.

Scene interpretation example (1) Scene interpretation example (2) Scene interpretation example (3) Scene interpretation example (4)


Screenshots of the developed application. A video (with comments in French) is also available on YouTube. For more details, see Chapter 3 (from page 92) in the Master's thesis.

Application - Data input Application - Classification output Application - Interpretation output

Project content

Note that the Master's thesis is in French.


Back end

Front end

  • Used technologies: HTML, CSS, JavaScript.

How it works

To understand how the proposed approach works, see Chapter 2 (page 31) in the Master's thesis.

Complementary Google Drive

Considering the voluminous size of the project (8.5 GB), caused principally by the trained models, it is divided into two complementary parts:

Application use

To run it, the application requires complementary files located in the Drive. So, once the repository is cloned, add to it missing files (with respecting paths: "Copy from" and "Copy to") according to the following table (for each cell, concatenate the header's current column path with the path in the cell itself):

Note that the application can be executed with some/without unary relationship definers (models in ./application_data/unary_classifier_models/). After that, make sure you have all requirements. And finally, run the Django server and execute the application via ./src/index.html.

Some remarks related to the application execution:

  • The object detection model ./src/pfe_app/scenes_tool/functions/scenes_tool_data/object_detection/tf_model/object_detection/ (from TensorFlow Model Zoo: Object detection) was compiled in Windows 10. So, a recompilation may be necessary for other OS.
  • The Django server's URL (for sending/receiving data between the backend and frontend) is To change it, go to ./src/website_data/output_data.js (first line).
  • To make development easy, requests to Django's server (CORS) were allowed from all sources, this can cause an important security threat. To modify this behavior, go to ./src/pfe_app/pfe_app/settings.py.

Credits and License

This project was realized in Laboratory of Research in Artificial Intelligence (LRIA), group BioInformatics and Robotics (BIR), affiliated to the University of Science and Technology Houari Boumediene (USTHB), by:

Supervised and proposed by:

This project is distributed under the MIT license. For more details, see LICENSE.md.