This project is corresponding to the paper "Local Information Assisted Attention-free Decoder for Audio Captioning" published in IEEE Signal Processing Letters.
The model structure of the proposed P-LocalAFT is provided in the file path model, and the dependent module parts (i.e., PANNs, AFT and the SpecAugment operation) are provided in the file path modules.
Examples of predicted captions are provided in the file path examples, where the corresponding audio files are also provided.
This project is released under the CC BY-NC-ND 4.0 license.