hcwei13's Stars
oobabooga/text-generation-webui
A Gradio web UI for Large Language Models with support for multiple inference backends.
OpenLMLab/MOSS
An open-source tool-augmented conversational language model from Fudan University
OpenBMB/XAgent
An Autonomous LLM Agent for Complex Task Solving
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
salesforce/CodeGen
CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
xlang-ai/OpenAgents
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Breakthrough/PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
princeton-vl/RAFT
DSXiangLi/DecryptPrompt
总结Prompt&LLM论文,开源数据&模型,AIGC应用
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
piergiaj/pytorch-i3d
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
DAMO-DI-ML/NeurIPS2023-One-Fits-All
The official code for "One Fits All: Power General Time Series Analysis by Pretrained LM (NeurIPS 2023 Spotlight)"
Luodian/RelateAnything
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
persimmon-ai-labs/adept-inference
Inference code for Persimmon-8B
LLaVA-VL/LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
baaivision/CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
dhg-wei/DeCap
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
Jingkang50/FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, and beyond.
gydpku/PPTC
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
PengZai/ARIC
Aesthetically-Relevant-Image-Captioning