cross-modal-pretraining

There are 2 repositories under cross-modal-pretraining topic.

  • DAMO-NLP-SG/Video-LLaMA

    [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

    Language:Python2.8k32158263
  • JacobYuan7/RLIP

    [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.

    Language:Python72493