Implementation of: Self-supervised Audio-and-Language Pre-training with Extremely Low-Resource Parallel Data

Model architecture is in models.py and modules.py

Overall pre-training flow is in main.py and m2p_runner.py

Downstream tasks training is in m2p_finetune.py

Data preprocessing scripts are in preprocess/

Tokenizer for text inputs is in tokenizer/

Configs for pre-training and fine-tuning are in config/

The code will be cleaned up and the README will be refined as soon as possible.