joseph-zhong/LipReading

TODOs as of 11/29

Opened this issue · 2 comments

  • Seeding (Zhaofeng)
  • Datasets should probably share char2idx as we want unseen chars to be generated as well (Joseph)
  • Sort through the data points in increaseing video length
  • Fix any dataset problem (e.g. micro not loading) (Joseph)
  • Check [INAUDIBLE] token #13 (Joseph)
  • Write something that puts dataloader/model/training/eval/logging/etc. together. Somethine like the current test_better_model.py but more sophisticated? Perhaps with tensorboard support (Joseph)
  • Finish eval function with loss and CER output (Zhaofeng)
  • (optional) Plug in a WER implementation into eval function (Yutong)
    • Maybe just write one ourselves
  • Joint CTC+attention model (Zhaofeng)
  • CNN frame processing (Joseph)
  • Experiments. See #18. (Yutong & Joseph)
  • Inference (beam search) (Yutong)
  • Implement exact-foward-approximate-backward?

Fuck me.....

vdalv commented

^🙂 Something to do with this?

This looks like an interesting project. Best of luck, guys.