GuyTevet/MotionCLIP

Training for action recognition

Radu1999 opened this issue · 0 comments

In the paper, it is said that for babel action recognition the training was performed using the labels rather than the raw text, but in code i found self.clip_label_text = "text_raw_labels" . "text_raw_labels" seems to load the raw text for each frame rather than just the category. Can you help me with understanding this? Thanks!