sysuyy's Stars
eloialonso/diamond
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
etched-ai/open-oasis
Inference script for Oasis 500M
dqxiu/ICL_PaperList
Paper List for In-context Learning 🌷
ali-vilab/VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
PKU-YuanGroup/ChronoMagic-Bench
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
mseitzer/pytorch-fid
Compute FID scores with PyTorch.
genmoai/mochi
The best OSS video generation models
microsoft/LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
rhymes-ai/Allegro
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
inFaaa/Autoregressive-Models-in-Vision-Survey
The paper collections for the autoregressive visual models.
Leminhbinh0209/FinetuneVAE-SD
Fine-tune VAE of Stable Diffusion model
google/break-a-scene
Official implementation for "Break-A-Scene: Extracting Multiple Concepts from a Single Image" [SIGGRAPH Asia 2023]
minerllabs/basalt-2022-behavioural-cloning-baseline
Simple behavioural cloning baseline solution for BASALT 2022
nahyeonkaty/textboost
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
360CVGroup/FancyVideo
This is the official reproduction of FancyVideo.
MineDojo/MineDojo
Building Open-Ended Embodied Agents with Internet-Scale Knowledge
facebookresearch/Ego4d
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
BolinLai/LEGO
[ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
doubleZ0108/Digital-Media-Technology-PKU
Fundamentals of Digital Media Technology(04713901) | Peking University ECE Course Materials
OpenGVLab/EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
TencentARC/SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
baaivision/Emu3
Next-Token Prediction is All You Need
huangb23/VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
NVIDIA/aistore
AIStore: scalable storage for AI applications
jquesnelle/yarn
YaRN: Efficient Context Window Extension of Large Language Models
Stability-AI/StableCascade
Official Code for Stable Cascade
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
HyeonHo99/Video-Motion-Customization
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
SkalskiP/top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]