huangshiyu13

Shiyu Huang(黄世宇), Deep RL, Multi-agent RL, CV, NLP, AGI, https://github.com/OpenRL-Lab/openrl

Zhipu AIBeijing, China

huangshiyu13's Stars

state-spaces/mamba
Mamba SSM architecture
Language:Python12.7k 101 5111.1k
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python7.8k 118 288718
Kwai-Kolors/Kolors
Kolors Team
Language:Python3.6k 35 125237
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
3.2k 133 18193
introlab/rtabmap
RTAB-Map library and standalone application
Language:C++2.7k 95 1.1k779
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python2.4k 25 267133
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2k 27 163134
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Language:Python1.4k 4 148112
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
1.3k 38 469
DachunKai/EvTexture
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Language:Python989 14 1765
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python796 38 4540
GAIR-NLP/anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Language:Python651 8 3936
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Language:Python503 12 6325
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
Language:Jupyter Notebook489 17 1721
BradyFU/Video-MME
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
372 5 3011
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
Language:Python297 8 2216
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Language:Python274 5 4524
showlab/videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Language:Python186 7 3525
PKU-YuanGroup/ChronoMagic-Bench
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Language:Python167 3 214
66Lau/NEXTE_Sentry_Nav
The navigation system of "sentry" for Next-E team in RoboMaster2023
Language:C++118 2 1517
HFAiLab/ffrecord
FireFlyer Record file format, writer and reader for DL training samples.
Language:Python115 5 98
bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Language:Python90 6 146
jizhang-cmu/autonomy_stack_go2
Full Autonomy Stack for Unitree Go2
Language:C++61 0 48
THUDM/LVBench
LVBench: An Extreme Long Video Understanding Benchmark
Language:Python52 10 31
OpenGVLab/EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
Language:Python43 1 60
bigai-nlco/LSTP-Chat
A Video Chat Agent with Temporal Prior
Language:Python22 3 31
WentseChen/Soft-QMIX
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
Language:Python50
THU-BPM/LLMArena
Code for paper "LLMARENA: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments" accepted by ACL 2024
Language:Python4 1 0
huangshiyu13/glm-4v-plus_API_usage
How to use GLM-4V-Plus API
Language:Python1
OpenRL-Lab/VideoHub
videohub api
Language:Python1 1 0

huangshiyu13

huangshiyu13's Stars

state-spaces/mamba

THUDM/CogVideo

Kwai-Kolors/Kolors

showlab/Awesome-Video-Diffusion

introlab/rtabmap

QwenLM/Qwen2-VL

THUDM/CogVLM2

EvolvingLMMs-Lab/lmms-eval

yunlong10/Awesome-LLMs-for-Video-Understanding

DachunKai/EvTexture

VITA-MLLM/VITA

GAIR-NLP/anole

Vchitect/VBench

AIGText/Glyph-ByT5

BradyFU/Video-MME

EvolvingLMMs-Lab/LongVA

RenShuhuai-Andy/TimeChat

showlab/videollm-online

PKU-YuanGroup/ChronoMagic-Bench

66Lau/NEXTE_Sentry_Nav

HFAiLab/ffrecord

bytedance/Shot2Story

jizhang-cmu/autonomy_stack_go2

THUDM/LVBench

OpenGVLab/EgoExoLearn

bigai-nlco/LSTP-Chat

WentseChen/Soft-QMIX

THU-BPM/LLMArena

huangshiyu13/glm-4v-plus_API_usage

OpenRL-Lab/VideoHub