pytorch/captum

Integrated Gradients for Frame-Level Classification Tasks

david-gimeno opened this issue ยท 0 comments

๐Ÿš€ Feature

Integrated Gradients not only for single classification task, but also for those that are based on frame-level classification, such as speech recognition, active speaker detection, etc.

Additional context

In the case of automatic lipreading, we have a tensor of shape (T, H, W, C), where T, H, W, and C refer to the time (no. of frames), height of the image, width of the image and channels of the image, respectively. I would like to apply Integrated Gradients so the targets of the model can be a sequence, i.e., an integer for each frame of the sequence. Each integer would represent the corresponding text token composing the final transcription.