-
This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
-
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.
-
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics. -
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
For CSL dataset please link it this way
ln -s PATH_TO_DATASET/CSL ./dataset/csl
-
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess.py --process-image --multiprocessing
-
For CSL preprocessing please run this command below
cd ./preprocess
python dataset_preprocess_csl.py --process-image --multiprocessing
To evaluate the pretrained model, run the command below:
python main.py --load-weights saved_model.pt --phase test
The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:
python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS
We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:
python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features