
To mitigate the problem of database sparsity and inter-class imbalance, this study propose a SER model based on Starganv2-VC for utterance-level data augmentation and combine handcrafted features embeddings with high-level features retrieved by transformer-based pre-training wav2vec 2.0.

This is the main code of AugmentationSERwithFeatureFusion.

Once our paper gets accepted, we will update the code here. pray )