M.Sc. in Computer Science (Bioinformatics and Modeling track - BIM)
Second semester research project - Sorbonne Université, 2021/2022
Supervisor: Denis Sheynikhovich, Associate Professor - Silver Sight Team, Institut de la vision, Sorbonne Université / INSERM / CNRS
Authors: Khánh Nam NGUYỄN and David Lambert
-
Ab initio implementation of an explainable artificial intelligence (XAI) method based on discrepancy maps and receptive field computation in convolutional neural networks from Zhou et al, 2015
-
application to AlexNet and a variant trained by the Silver Sight team on images from a behavioral neuroscience experiment on human orientation in an artificial environment (the Avatar experiment by Becu et al. using the Institute of Vision's Streetlab platform)
The project report and the slides used during the project defense are included in the pdf
folder.
Throughout the code and accompanying documents, mentions of "AlexNet", "AxN", "alexnet", or "axn" refer to the pre-trained, "vanilla" version of AlexNet made available by PyTorch via their Hub platform and downloaded from said platform; "AvatarNet", "AvN", "avatarnet" and "avn" refer to the the network trained by the Avatar team before the authors were onboarded on the present research project.
Each network uses a specific dataset, with specific image sizes, on which a specific preprocessing function must be applied before passing any image through the network.
Following the "tracer bullet" principle outlined in The Pragmatic Programmer (Thomas & Hunt, 1999), this code repository contains a proof-of-concept implementation of the methods outlined in Zhou et al, 2015.
The code included is the authors' honest attempt at implementing these methods from scratch, without referring to previously implemented versions of Zhou's or similar methods, and without any prior knowledge of the PyTorch platform.
As a consequence, it is the authors' opinion that, while functional and successful, several aspects of the implementation disqualify it from being production-ready, in particular when it comes to its execution speed. Tentative solutions to the issues identified by the authors have been included in the accompanying documentation (see TODO.md
).
The implementation is described in the Pipeline
section below.
The authors would like to stress the fact that the research project, the code repository and the accompanying documents do not deal with dataset curation and/or network training, both tasks having been carried out by members of the Avatar team before the authors joined them for this research project. A brief account of the training process has nevertheless been graciously provided by Bilel Abderrahmane Benziane and included in the report.
activation_logging.py
logs the top 10 images for all units of all ReLU layers of the chosen network in txt
files.
main_script.py
wraps the rest of the pipeline: the parameters (which model to use, which units of which layers to target, which stride to use in the occlusion process) for the receptive field computation are defined, and then passed to the scripts listed below:
occlusion.py
parses the relevant log file and creates the occluded images from the top 10 images, with which the discrepancy map will be computed;discrepancy.py
computes the discrepancy maps;top10.py
assembles the top-10 images into one to make the results more readable;rec_field.py
computes the receptive field from the discrepancy maps.
helpers.py
contains the helper functions used in the other scripts to make code more modular. In particular, top_10
parses a given log file and returns the top 10 images for a specific unit of a specific layer; stack
stacks a list of images together, either in a row or in a column; occlusion
creates the occluded images from a specific image; network_forward_pass
passes a given image through a given model.
The following instructions have been tested using Ubuntu inside Windows WSL; tests in other environments are pending.
- create a directory for the project, and open a terminal with the project directory as the current working directory.
- create a virtual environment to isolate the project's execution and avoid version conflicts, using Python 3.8 or above, for example using
venv
(see documentation). The following command creates an environment calledenv_silver-system
:
python3.8 -m venv env_silver-system
- activate the environment:
source env_silver-system/bin/activate
- once the environment has been activated, install the Python dependencies described in the attached
requirements.txt
file (for example by callingpip install -r requirements.txt
)
You can then run the scripts in the ./src
subdirectory:
python ./src/activation_logging.py DATASET [NETWORK]
will go through the files inDATASET
(seeDatasets
section below) and log the top 10 activating files for all units of all ReLU layers ofNETWORK
. TheNETWORK
argument is optional: the script will use AlexNet by default, butavn
will specify the use of AvatarNet:
python ./src/activation_logging.py coco-unlabeled2017
python ./src/activation_logging.py avatar_dataset avn
python ./src/main_script.py
wraps the rest of the receptive field computation pipeline, and will execute it using either theaxn
or theavn
log from thelogs
directory, targeting the layers and units specified in its body.
The execution of activation_logging.py
being, at the time of the present release, quite slow, logs for axn
and avn
have already been included in the present repository, under the logs
directory.
AvatarNet should be in an avatar
directory at the same arborescence level as src
and logs
: specifically, the file that the scripts expect to find there is named alexnet_places_old.pt
.
Directory structure for the datasets is detailed below.
They should be subfolders of the ./datasets
folder (paths such as ./datasets/{DATASET}
). Each of the two networks uses two datasets : a very small one to test and iterate without wasting time while implementing the elementary components and functionalities, and a large one on which the actual results (see accompanying slides and report) are obtained.
A validation subset of the Avatar dataset is used to compute AvatarNet's confusion matrix and accuracy to crosscheck against the figures provided by the team that trained the network and make sure preprocessing worked as planned. Please note that neither AvatarNet nor the Avatar dataset have, to date, been released to the public by the Avatar team.
Below are details about each dataset or set of images:
Entire unlabeled COCO (Common Objects in COntext) dataset
Download link
Structure: none (flat folder containing the images, no subfolder)
Subset of the dataset above Structure: none (flat folder containing the images, no subfolder)
Contents of "DataAfterWellSamplingCleaned.zip" 53k images: 26k in the "geo" class and 27k in the "lmk" class Structure: two subfolders "geo" and "lmk"
Subset of the above 100 images: 50 in the "geo" class and 50 in the "lmk" class Structure: two subfolders "geo" and "lmk"
Contents of the validation subset included in "DataAfterWellSamplingCleanedTrainTestVal.zip", used for the confusion matrix and accuracy computations 1k images: 500 for each class Structure: two subfolders "geo" and "lmk"
For a detailed to-do list, please refer to the attached TODO.md
file.
This implementation is licensed under a GNU GPLv3 license.