fortminors/tgz

Python

Instructions

Environment

pip install -r requirements.txt

T2.1

Model choice: ResNet18 trained on train.part1 with BCE loss
cd <directory/with/train.py>

Train

- python train.py --train_dataset_path <path/to/train1/train> --val_dataset_path <path/to/val/val>

Test

- python test.py --dataset_path <path/to/val/val>
- Metrics achieved on dataset train.part2: Accuracy = 0.99, Precision = 0.99, Recall = 0.99

T2.2 Baseline

Model choice: custom autoencoder trained on train.part1 with L1 loss
cd <directory/with/train.py>

Train

- python train.py --train_dataset_path <path/to/train1/train> --val_dataset_path <path/to/val/val>

Test

- python test.py --dataset_path <path/to/val/val>
- Metrics achieved on dataset train.part2: MSE = 0.236

Possilbe improvements

SwinUNet transformer for image denoising
Swin Transformer for image restoration
Convert mel-spectrograms to audio arrays with known construction parameters, such as sampling rate. Then apply something like Speech denoising WaveNet to remove noise