/text-localization-agent

Python-based program for training an agent for text localization using reinforcement learning with ChainerRL

Primary LanguagePython

text-localization-agent

The code to train the agent.

Attention!

This project currently contains a memory leak, which means that during long training runs it might use up all your memory and make the server slow down or crash!

Prerequisites

You need Python 3 (preferably 3.6) installed, as well as the requirements from requirements.txt:

$ pip install -r requirements.txt 

Furthermore, you need to install the text-localization-environment by following its Installation instructions.

Usage

Training an agent requires two files:

  1. A textfile where each line contains the path to one image in the training dataset
  2. A numpy file (.npy) that contains the bounding boxes associated with each image. For n images this file contains a list with n entries where each entry is a list of bounding boxes in the format ((xtopleft, ytopleft), (xbottomright, ybottomright))

Datasets generated by the dataset generator fullfill these requirements. With these two files you can start training by starting the train_agent.py script. Here is an overview of the available options:

Option name Shorted name Explanation Default value
--steps -s Amount of steps to train the agent 2000
--gpu ID of the GPU to be used. -1 if the CPU should be used instead -1
--imagefile -i Path to the file containing the image locations 'image_locations.txt'
--boxfile -b Path to the bounding boxes 'bounding_boxes.npy'
--tensorboard/--no-tensorboard Whether or not to use TensorBoard logging False
--help Display these options

TensorBoard

If you would like the program to generate log-files appropriate for visualization in TensorBoard, you need to:

  • Install tensorflow
    $ pip install tensorflow
    (If you use Python 3.7 and the installation fails, use: pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.12.0-py3-none-any.whl instead. See here, why.)
  • Run the text-localization-agent program with the --tensorboard flag
    $ python train-agent.py --tensorboard --imagefile … --boxfile …
  • Start TensorBoard pointing to the tensorboard/ directory inside the text-localization-agent project
    $ tensorboard --logdir=<path to text-localization-agent>/tensorboard/
    …
    TensorBoard 1.12.0 at <link to TensorBoard UI> (Press CTRL+C to quit)
  • Open the TensorBoard UI via the link that is provided when the tensorboard program is started (usually: http://localhost:6006)

Training on the chair's servers

To run the training on one of the chair's servers you need to:

  • Clone the necessary repositories
  • Create a new virtual environment. Note that the Python version needs to be at least 3.6 for everything to run. The default might be a lower version so if that is the case you must make sure that the correct version is used. You can pass the correct python version to virtualenv via the -p parameter, for example
    $ virtualenv -p python3.6 <envname>
    (If there is no Python 3.6/3.7 installed you are out of luck because we don't have sudo access)
  • Activate the environment via
    $ source <envname>/bin/activate
  • Install the required packages (see section "Prerequisites"). Don't forget cupy, tb_chainer and tensorflow!
  • Prepare the training data (either generate it using the dataset-generator or transfer existing data on the server)
  • To avoid stopping the training after disconnecting from the server, you might want to use a terminal-multiplexer such as tmux or screen
  • Set the CUDA_PATH and LD_LIBRARY_PATH variables if they are not already set. The command should be something like
    $ export CUDA_PATH=/usr/local/cuda
    $ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
  • To download the ResNet-152 caffemodel (it isn't downloaded automatically) see link and save it where necessary (an error will tell you where if you try to create a TextLocEnv).
  • Start training!

These instructions are for starting from scratch, for example if there is already a suitable virtual environment you obviously don't need to create a new one.

Evaluating

  • To evaluate a previously trained agent on a dataset, you may use the evaluate method available as a click CLI when executing:
    $ python evaluate_agent.py
    (Run python evaluate_agent.py --help to see the required parameters for the CLI)
  • If you provide the --save flag in the CLI above, it creates .npy files which can be read by the evaluate_from_files CLI afterwards:
    $ python evaluate_from_files.py
    (Run python evaluate_from_files.py --help to see the required parameters for the CLI)
  • The evaluate_from_files CLI allows defining an IoU threshold used for the calculation of the evaluation metrics. Furthermore, it does not only output the mean average precision (mAP) but also the precision and recall values.

Creating image sequences/animations for visualization purposes

  • To create an image sequence of a an already trained agent acting on a specific image, use:
    $ python generate_image_sequence.py
    (Run python generate_image_sequence.py --help to see the required parameters for the CLI and have a look into the generate_image_sequence.py file for instructions on creating a video out of the generated single frames using ffmpeg)