How to extract the patch-level visual feature?

Question

How to extract the patch-level visual feature?

Closed this issue a year ago · 0 comments

By viewing the patch-level features extracted from the code you provided, the size is [T, patch_nums, C]. It can be seen that the patch-level features simply repeat the frame-level features of size [T, C] patch_nums times in the second dimension.

the function def **ImageClIP_Patch_feat_extract**(dir_fps_path, dst_clip_path) in feat_script/extract_clip_feat/ extract_patch-level_feat.py use the same image encoder as in def **ImageClIP_feat_extract**(dir_fps_path, dst_clip_path)