PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
PythonMIT
Issues
- 1
AttributeError: 'NoneType' object has no attribute 'astype' in Depth processor
#68 opened by SoyeonHH - 1
ValueError: Input image size (112*1036) doesn't match model ([112, 1036]*[112, 1036]).
#67 opened by JeffRody - 1
关于数据集的一些问题
#57 opened by XiaoZong0 - 2
- 0
embedding arithmetic
#65 opened by bakachan19 - 7
Combination of multiple modalities
#38 opened by anthony-mendil - 0
How to calculate similarity of Video to audio?
#64 opened by Coooderr - 0
token masking and contrastive learning
#63 opened by ooochen-30 - 0
Can not find datasets for LanguageBind_Image?
#62 opened by superwood - 0
- 0
Embedding similarity
#60 opened by akBear23 - 0
Any support for languages other than English?
#59 opened by ragesh2000 - 0
Method of running evaluation on MSR-VTT dataset
#58 opened by sartaki - 0
Video-Language Pre-training hours
#56 opened by msw6468 - 0
Are some of these models interchangeable?
#55 opened by felmoreno1726 - 0
Pretraining on video dataset without lora.
#54 opened by shihuai - 4
Clarification questions about the framework
#50 opened by felmoreno1726 - 0
- 0
- 1
Fine-tuneing LLM + LanguageBind?
#42 opened by Crystalxd - 2
- 1
gpu资源
#47 opened by letaozhang - 0
NameError: name 'get_audio_anno' is not defined
#52 opened by noah003 - 2
where is LanguageBind_Image
#46 opened by hd201708010401 - 0
关于视频文本的训练问题
#49 opened by Tunanzzz - 1
- 5
Inconsistent running results of inference.py
#45 opened by Jade999 - 0
confusion about VIDAL-10M video-text data
#44 opened by wli333 - 1
- 1
- 1
Audio-Language Alignment data for reproduction
#36 opened by memoiry - 1
- 2
Can you share the NYU-D dataset you used for evaluation, e.g. how to split the dataset?
#29 opened by bf-yang - 0
finetuning on a classification task
#35 opened by Sravanthgithub - 1
Vision encoder version
#34 opened by JosephPai - 4
- 1
Congrats on Acceptance !!!
#33 opened by SenmiaoORZ - 0
batch inference
#31 opened by doyikim1 - 2
- 1
视频特征的提取支持动态帧数吗,效果相对于8帧会有下降或者变差吗
#27 opened by 1093842024 - 1
- 1
- 7
Add flash attention 2
#19 opened by pphuc25 - 2
VIT-H model release
#22 opened by tikboaHIT - 1
- 1
where is the LanguageBind_Audio_FT in huggingface?
#24 opened by kou35 - 1
about LanguageBind_Video_merge
#23 opened by kou35 - 4
Hashtags and prompts?
#21 opened by Kamino666 - 1
用于特征提取对齐,选用输出为什么参数
#20 opened by xiaohaochen0308 - 1