This is a replication of the work published here., done by the RPI Intelligent Systems Lab.
The goal of this research is to produce a practical system for tracking the 2D position of the user's gaze on a mobile device screen. It uses an appearance-based method that employs a CNN operating on images of the user's eyes and face taken by the device's front-facing camera. Nominally, it can achieve ~2 cm net accuracy, as mesured on the GazeCapture validation dataset.
The code available here can be used for two basic functions. The first is constructing and training the CNN model. The second is running a simple demo using an already-trained model. The demo works by creating a simple server that can be passed images by the client, and send back the estimated gaze point. There is also a demonstration Android app that acts as a client and implements a simple demo which we intend to make available in the future.
Training the model requires TensorFlow, Keras, and Rhodopsin. Practically, it also requires a GPU, preferably one with at least 8 Gb of VRAM.
Training the model requires the GazeCapture dataset. The dataset must also be converted to a TFRecords format before it can be used for training. Once the dataset has been downloaded, there is an included script to perform this conversion:
~$ ./process_gazecap.py dataset_dir output_dir
The first argument is the path to the root directory of the GazeCapture dataset that you downloaded. The second argument is the path to the output directory where you want the TFRecords files to be located.
This script will create three TFRecords files in the output directory: One for the training data, one for the testing data, and one for the validation data.
For convenience, this script can be run in the same Docker container as the actual training. (See below.)
By far, the easiest way to train the model is to use the pre-built Docker container. There is a script included that will automatically pull this container and open an interactive shell:
~$ ./start_train_container.sh
Note that this requires both Docker and nvidia-docker to be installed on your local machine.
The script automatically mount-binds the repository directory to the
isl_gazecapture
directory inside the container. Once inside this directory,
training can be initiated as follows:
~$ ./train_gazecap.py train_dataset test_dataset
The first argument is the path to the TFRecords file containing the training dataset. Likewise, the second argument is the path to the file containing the testing dataset.
Many of the attributes that pertain to training are set in constants defined at
the top of the train_gazecap.py
file. These can be modified at-will, and have
comments documenting their functions.
Additionally, settings that are common to both the training procedure and the
demo server are located in itracker/common/config.py
. Most likely, the only
parameter here that you might want to modify is NET_ARCH
. This parameter
points to the class of the network to train. (Different classes are defined in
network.py
for different network architectures.) This can be changed to any of
the classes specified in network.py
in order to experiment with alternative
architectures.
You can start training with a pre-trained model if you wish. This is done by
passing the hd5 file containing the saved model weights with the -m
parameter
to the training script.
Note that the architecture of saved models cannot be automatically detected, so
it must be ensured that the current value of NET_ARCH
matches the saved model
architecture.
You may with to evaluate the accuracy of the model on the GazeCapture validation dataset. This dataset should be automatically generated, but is not used during the training procedure.
In order to validate a saved model, specify the location of the validation
dataset using the -v
option. (Note that you will also have to specify the saved
model with the -m
option for this to work.)
TODO (djpetti): Write this section.