yossigandelsman/clip_text_span
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
Jupyter NotebookMIT
Issues
- 2
- 0
- 0
- 0
- 2
About text_descriptions
#11 opened by dbsdmlgus50 - 1
A question on token decomposition
#10 opened by X-funbean - 1
TEXTSPAN Algorithm
#9 opened by toffeecat - 0
How to hook only last few layers?
#8 opened by tangli-udel - 1
Did you compare the zero-shot segmentation performance between initial clip and your proposed image tokens decompositions ?
#6 opened by Yang-bug-star - 4
Whether you evaluate the direct effect of different layers on zero-shot classification accuracy on the test set or on the same validation set used to calculate the mean
#4 opened by Yang-bug-star - 1
How to unroll the direct effect to find a second-order effect and how to remove such second-order effect? Could you explain more on detail.
#7 opened by Yang-bug-star - 1
Here, 'base' refers to randomly ablating 10 heads, or does it refer to the original OpenCLIP?
#5 opened by Yang-bug-star - 4
Queries on Equation 5 and 6 Notations
#3 opened by chu7zpah - 1
- 2
arXiv PDF is not available
#1 opened by SakurajimaMaiii