huangshiyu13
Shiyu Huang(黄世宇), Deep RL, Multi-agent RL, CV, NLP, AGI, https://github.com/OpenRL-Lab/openrl
Zhipu AIBeijing, China
huangshiyu13's Stars
state-spaces/mamba
Mamba SSM architecture
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Kwai-Kolors/Kolors
Kolors Team
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
introlab/rtabmap
RTAB-Map library and standalone application
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
DachunKai/EvTexture
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
PKU-YuanGroup/ChronoMagic-Bench
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
66Lau/NEXTE_Sentry_Nav
The navigation system of "sentry" for Next-E team in RoboMaster2023
HFAiLab/ffrecord
FireFlyer Record file format, writer and reader for DL training samples.
bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
jizhang-cmu/autonomy_stack_go2
Full Autonomy Stack for Unitree Go2
THUDM/LVBench
LVBench: An Extreme Long Video Understanding Benchmark
OpenGVLab/EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
bigai-nlco/LSTP-Chat
A Video Chat Agent with Temporal Prior
WentseChen/Soft-QMIX
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
THU-BPM/LLMArena
Code for paper "LLMARENA: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments" accepted by ACL 2024
huangshiyu13/glm-4v-plus_API_usage
How to use GLM-4V-Plus API
OpenRL-Lab/VideoHub
videohub api