LongXinKou

LongXinKou's Stars

RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Language:Python27424
Hon-Wong/Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
Language:Python472
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python2.5k138
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Language:Jupyter Notebook82765
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook14.9k1.4k
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook11.3k971
magic-research/PLLaVA
Official repository for the paper PLLaVA
Language:Python57138
LLaVA-VL/LLaVA-NeXT
Language:Python2.5k190
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
Language:Python2.6k251
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.3k85
robocasa/robocasa
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Language:Python52338
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2k135
H-Freax/Awesome-Video-Robotic-Papers
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
1136
dvlab-research/LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Language:Python70145
thunlp/LEGENT
Open Platform for Embodied Agents
Language:Python25015
ario-dataset/ario-tools
Data collection, saving and publishing code for ARIO dataset. Collect multi-sensor rostopic data and store it in a specific structure.
Language:C++72
Shengqiang-Zhang/LoHo-Ravens
Official code for the long-horizon language-conditioned robotic manipulation benchmark LoHoRavens.
Language:Python8
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language:Python2.9k208
EmbodiedGPT/EmbodiedGPT_Pytorch
Language:Python33132
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Language:Python3.2k277
VRU-NExT/VideoQA
776
TommyZihao/zihaowordcloud
simple tutorials and examples of wordcloud-python
Language:Python30598
haosulab/ManiSkill
SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
Language:Python772140
caotians1/BabyAIPlusPlus
BabyAI++: Towards Grounded language Learning beyond Memorization, ICLR BeTR-RL 2020
Language:Python251
Farama-Foundation/Minigrid
Simple and easily configurable grid world environments for reinforcement learning
Language:Python2.1k604
DirtyHarryLYL/LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
83235
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language:Python10.1k798
Genesis-Embodied-AI/RoboGen
A generative and self-guided robotic agent that endlessly propose and master new skills.
Language:Python56249
amusi/CVPR2024-Papers-with-Code
CVPR 2024 论文和开源项目合集
17.9k2.6k
google/or-tools
Google's Operations Research tools:
Language:C++11.1k2.1k