takenpeanut
Second-year PH.D. student in PKU Focusing on RL & Large Models now
Peking UniversityBeijing, China
takenpeanut's Stars
deepseek-ai/DeepSeek-V3
PKU-YuanGroup/Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
GengzeZhou/NavGPT-2
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
JeremyLinky/YouTube-VLN
[ICCV'23] Learning Vision-and-Language Navigation from YouTube Videos
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
jacobkrantz/VLN-CE
Vision-and-Language Navigation in Continuous Environments using Habitat
Dantong88/LLARVA
LLaVA-VL/LLaVA-NeXT
mu-cai/TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
joez17/VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
Vision-CAIR/LongVU
RUCAIBox/POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
open-compass/MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
ScanNet/ScanNet
facebookresearch/open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
feizc/Cleaned-Webvid
Use strategy to achieve clean webvid-10m dataset
rohitrango/automatic-watermark-detection
Project for Digital Image Processing
NJU-PCALab/OpenVid-1M
boomb0om/watermark-detection
Model for watermark classification implemented with PyTorch
OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
hkchengrex/XMem
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
z-x-yang/Segment-and-Track-Anything
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.
nltk/nltk
NLTK Source
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python