TODO: 30th April

  • Setup baseline training and inference code in notebook
  • Move code from notebook to config + runner
  • Target baseline validation f1_score - 69%
  • Setup debugging where appropriate

13th May -

  • Generate melspecs for faster iteration, 5 sec samples
  • Setup SED model with densenet121, resnest50d, efficientnet-b0
  • Use 5 second clips
  • Add Gaussian, Pink, Band noise and BAD dataset as background noise
  • Random power for signal
  • Use modified mixup - change mixup to use both labels
  • Use secondary labels with 0.3 as target
  • Setup evaluation --> if bird in segment with threshold greater than T, increase prior for that bird for complete file
  • Submit on LB -->

17th May

  • effb0 baseline scores - 0.71
  • res50_effb0+mix+rex different strtified folds - 0.71x
  • Try simple kfold and average scores -- DONE - 0.73 for res26
  • Restructure evaluation script, provision for location wise priors - Done
  • res26d 5-kfold, effb0 5-kfold, res50 5-kfold - Done
  • Post processing techniques - In progress
  • Better evaluation metrics on sundscapes - Todo
  • Add noise from soundscapes to training - Todo
  • Find out echo augmentation - Todo
  • Band pass filter for post processing, high frequency cutoff? - Todo
  • log mean-max pooling from Jan Schutler - Todo - P1
  • combining channels with different weights - most files seem single channel - mostly mono files - not useful
  • 30 second training - should be coupled with 0.5 weight for secondary labels
  • Try pretraining on audioset? - Todo
  • Training time improvements for faster iteration - Todo

19th May

  • So far what has worked - used training augmentations from vlomme
  • Combined res26 and effb0 preds
  • Moving from vlomme post processing to logit model helped a bit, may be try GBM
  • Backbone dicriminative experiments -
  • Different backbones -
  • External datasets
  • Pretraining ?
  • Adding nocall from current soundscapes as background noise, also changing background noises
  • Restarts for current trained data
  • More resolution ?
  • Separate threshold for each site - Done

External datasets and pretraining seem to be strongest levers as of now. Current options for external data are:

  • additional birds recordings from xeno canto - could be used to pretrain for better finetuning
  • soundscapes - add soundscapes from previous competitions and current one
  • audioset pretraining - has helped in PANN based models