This repository contains the code and final report of my bachelor thesis at the IT University of Copenhagen. The thesis was supervised by Stella Grasshof.
The goal of this project was to train and deploy single-frame and video classification models to provide accurate indoor localisation predictions with a room-level granularity. The project entailed the collection and annotation of a novel video dataset tailored for indoor localisation and the rigorous training and evaluation of various modern neural network architectures.
An example of a live inference of a trained model can be seen on the right. The model is trained to predict the room of a video clip. The model is deployed on the edge using PlayTorch, a React Native bridge to the PlayTorch Mobile SDK.
In an increasingly urbanised and digitalised world, indoor localisation is becoming a necessity for a wide variety of applications, ranging from personal navigation to augmented reality. However, despite extensive research efforts, indoor localisation remains a challenging task and no single solution is widely adopted. Motivated by the success of deep learning in numerous computer vision tasks, this study explores the feasibility of deep learning for accurate room-level localisation in indoor spaces. Various neural network architectures are trained and evaluated on a novel video dataset tailored for indoor localisation. The findings reveal that deep learning approaches can provide reasonable localisation results, even when trained on a small dataset. The approach is currently limited by its inability to distinguish between visually similar and adjacent areas, as well as biases within the training data. Despite these shortcomings, the results are encouraging and inspire optimism about the methodโs practical viability.
You can try out a selection of trained models on your mobile phone! They are deployed using PlayTorch. To try it out yourself follow these steps:
- Download the PlayTorch App in the App Store (iOS) or Play Store (Android)
- Open the App and scan the QR code on the right
- Go to the Institut for Medier, Erkendelse og Formidling
๐ฅ You are all set. Walk around the indoor space and observe the model's predictions.
The backbone of this project is written in Python. The project runs in any minor
version of Python 3.10
. Make sure that you have the correct Python version by
running python --version
. If you are using a different version, you can use
pyenv
to install the correct version.
All dependencies are managed with Poetry
.
Assuming that you have poetry
installed, you can install all dependencies by
running:
poetry install
This command will create a virtual environment for you and install all relevant dependencies into it. You can activate the virtual environment by running:
poetry shell
Alternatively you can run all commands from your regular shell session by
prefixing the command with poetry run
, e.g. to run the training python script,
you would type:
poetry run src/train.py ...
If you wish to use another dependency manager, you can find a list of all
dependencies in pyproject.toml
.
Because of the large data size, this repository does not contain the raw data. Instead if contains a zip file with the processed frames and videos that can be used to train the models.
Before running the project, you will have to extract the data. To extract all
data navigate into the directory src/data
and unzip images.zip
and
videos.zip
cd data
unzip images.zip && rm -rf images.zip
unzip videos.zip && rm -rf videos.zip
Note, that you need to extract the data before running all scripts in the data, because the data class depends on the data being extracted locally.
The project offers two main entry points for running the project:
train.py
: Train a single model with chosen training hyperparameterseval.py
: Evaluate a model's performance and efficiency on test splitinfer.py
: Run live inference on a exemplary video clip
The job files train.job
and eval.job
are used to run the experiments on the
SLURM cluster of HPC of the IT
University of Copenhagen.
The train.py
script is the central script to train different models and with
different hyperparameters. For example, to train ResNet18 on all data using
default hyperparameters and logging to W&B:
python src/train.py -M resnet18
You can see the identifiers for all models within this project in the file
defaults.py
. Find out more about all hyperparameters that you can tweak by running:
$ python src/train.py -h
usage: train.py [-h] -M MODEL [-V VERSION] [--wandb-log | --no-wandb-log] [--wandb-name WANDB_NAME]
[--wandb-group WANDB_GROUP] [--wandb-tags WANDB_TAGS [WANDB_TAGS ...]]
[--epochs EPOCHS] [--device DEVICE] [--batch-size BATCH_SIZE] [--lr LR]
For more detailed output run the command yourself.
The eval.py
script loads a trained model, as specified by the model identifier
and version number, from the public W&B repository. It then evaluates the model
on the test split and logs the results to W&B.
Unless you have trained a model yourself, you do not need to run this script.
The infer.py
script loads a trained model, as specified by the model
identifier and version number, from the public W&B repository. It then selects a
random or specified video clip from the test split and runs live predictions of
the model on the video clip. The top prediction and confidence score are
overlayed and displayed as a video instance.
Because of data size limitations on GitHub, only a single video clip is public
in this repository (see data/raw/230313_04/video.mov
). To run inference on this
video clip for v0
of ResNet18, run:
python src/infer.py -M resnet18 -V v0 --clip 230313_04
These are the available arguments to the script:
$ python src/infer.py -h
usage: infer.py [-h] -M MODEL [-V VERSION] [--gradcam | --no-gradcam]
[--split {train,test}] [--clip CLIP]
For more detailed output run the command yourself. Further, note that the gradcam overlay is only available for the ResNet18 model.
There is a number of Jupyter Notebooks in the directory notebooks
, which were
used to gather statistics, results and generate visualisations for the final
report. Each should be self-explanatory when following them block-by-block, so
this is only a short list of the included notebooks:
eda.ipynb
contains some verifications and basic data analysis on the gathered datasetoptimise-mobile.ipynb
contains the process of optimising a model for mobile deploymentresults.ipynb
contains a series of evaluation techniques for trained models that have been logged to W&B