LongXinKou's Stars
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Hon-Wong/Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
magic-research/PLLaVA
Official repository for the paper PLLaVA
LLaVA-VL/LLaVA-NeXT
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
robocasa/robocasa
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
H-Freax/Awesome-Video-Robotic-Papers
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
dvlab-research/LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
thunlp/LEGENT
Open Platform for Embodied Agents
ario-dataset/ario-tools
Data collection, saving and publishing code for ARIO dataset. Collect multi-sensor rostopic data and store it in a specific structure.
Shengqiang-Zhang/LoHo-Ravens
Official code for the long-horizon language-conditioned robotic manipulation benchmark LoHoRavens.
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
EmbodiedGPT/EmbodiedGPT_Pytorch
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
VRU-NExT/VideoQA
TommyZihao/zihaowordcloud
simple tutorials and examples of wordcloud-python
haosulab/ManiSkill
SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
caotians1/BabyAIPlusPlus
BabyAI++: Towards Grounded language Learning beyond Memorization, ICLR BeTR-RL 2020
Farama-Foundation/Minigrid
Simple and easily configurable grid world environments for reinforcement learning
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Genesis-Embodied-AI/RoboGen
A generative and self-guided robotic agent that endlessly propose and master new skills.
amusi/CVPR2024-Papers-with-Code
CVPR 2024 论文和开源项目合集
google/or-tools
Google's Operations Research tools: