How to extract the patch-level visual feature?
Closed this issue · 0 comments
Bravo5542 commented
By viewing the patch-level features extracted from the code you provided, the size is [T, patch_nums, C]. It can be seen that the patch-level features simply repeat the frame-level features of size [T, C] patch_nums times in the second dimension.
the function def **ImageClIP_Patch_feat_extract**(dir_fps_path, dst_clip_path)
in feat_script/extract_clip_feat/ extract_patch-level_feat.py
use the same image encoder as in def **ImageClIP_feat_extract**(dir_fps_path, dst_clip_path)