hlchen23
A Two-year Ph.D. student at Tsinghua University now. My research interests focus on multimodal learning and LLM. chenhl23@mails.tsinghua.edu.cn
THU
hlchen23's Stars
Dongping-Chen/ISG
Official code repository for Interleaved Scene Graph.
TencentQQGYLab/ELLA
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
mit-han-lab/vila-u
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
allenai/unified-io-2
SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
dvirsamuel/PDM
Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".
TimeMarker-LLM/TimeMarker
A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
layer6ai-labs/xpool
https://layer6ai-labs.github.io/xpool/
xuguohai/X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
hrtang22/MUSE
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"
WHB139426/Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
soraw-ai/Awesome-Text-to-Video-Generation
A list for Text-to-Video, Image-to-Video works
guyyariv/TempoTokens
This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
microsoft/i-Code
ChenHsing/SimDA
[CVPR 2024] SimDA: Simple Diffusion Adapter for Efficient Video Generation
dvlab-research/Video-P2P
Video-P2P: Video Editing with Cross-attention Control
huangmozhi9527/GMMFormer
[AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
hlchen23/VERIFIED
Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding" http://arxiv.org/abs/2410.08593
asuc-octo/berkeleytime
UC Berkeley enrollment info
enkeejunior1/Diffusion-Pullback
Official Implementation of understanding the latent space of diffusion models through the lens of riemannian geometry (NeurIPS 2023)
diffusion-hyperfeatures/diffusion_hyperfeatures
Official PyTorch Implementation for Diffusion Hyperfeatures, NeurIPS 2023
google-research/readout_guidance
Official PyTorch Implementation for Readout Guidance, CVPR 2024
Carmenw1203/DanceCamAnimator-Official
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. [ACMMM 2024] Official PyTorch implementation
Dai-Wenxun/MotionLCM
[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"
diffusion-motion-transfer/diffusion-motion-transfer
Official Pytorch Implementation for "Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer""
dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy
ali-vilab/VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
uncbiag/Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
lhanchao777/LVLM-Hallucinations-Survey
This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and continuously update our survey, we maintain this repository of relevant references.