Pinned Repositories
adv-inf
Adversarial Inference for Multi-Sentence Video Descriptions (CVPR 2019)
Ask-Anything
[VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
cider
python codes for CIDEr - Consensus-based Image Caption Evaluation
coco-caption
densevid_eval
Evaluation code for Dense-Captioning Events in Videos
localized-skd
Localized Symbolic Knowledge Distillation for Visual Commonsense Models (Neurips 2023]
lsmdc-baseline
lsmdc-fillin
Identity-Aware Multi-Sentence Video Description
visual-comet
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
jamespark3922's Repositories
jamespark3922/visual-comet
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
jamespark3922/adv-inf
Adversarial Inference for Multi-Sentence Video Descriptions (CVPR 2019)
jamespark3922/lsmdc-baseline
jamespark3922/lsmdc-fillin
Identity-Aware Multi-Sentence Video Description
jamespark3922/localized-skd
Localized Symbolic Knowledge Distillation for Visual Commonsense Models (Neurips 2023]
jamespark3922/densevid_eval
Evaluation code for Dense-Captioning Events in Videos
jamespark3922/Ask-Anything
[VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
jamespark3922/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
jamespark3922/cider
python codes for CIDEr - Consensus-based Image Caption Evaluation
jamespark3922/coco-caption
jamespark3922/Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
jamespark3922/movie_eval
jamespark3922/nlg-eval
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
jamespark3922/RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
jamespark3922/self-critical.pytorch
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning
jamespark3922/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
jamespark3922/Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
jamespark3922/video-lang-contrast-set
jamespark3922/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding