Implementation of: Self-supervised Audio-and-Language Pre-training with Extremely Low-Resource Parallel Data
Model architecture is in models.py and modules.py
Overall pre-training flow is in main.py and m2p_runner.py
Downstream tasks training is in m2p_finetune.py
Data preprocessing scripts are in preprocess/
Tokenizer for text inputs is in tokenizer/
Configs for pre-training and fine-tuning are in config/
The code will be cleaned up and the README will be refined as soon as possible.