Pinned Repositories
EMO
Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
generative-models
Generative Models by Stability AI
HarmonyView
Official pytorch implementation of "HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D"
i2vgen-xl
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
I2VGen-XL-colab
ICONIP2019
Code and dataset for ICONIP2019
InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
LangSplat
Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting"
Kai0226's Repositories
Kai0226/EMO
Kai0226/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Kai0226/generative-models
Generative Models by Stability AI
Kai0226/HarmonyView
Official pytorch implementation of "HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D"
Kai0226/i2vgen-xl
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Kai0226/I2VGen-XL-colab
Kai0226/ICONIP2019
Code and dataset for ICONIP2019
Kai0226/InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
Kai0226/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
Kai0226/LangSplat
Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting"
Kai0226/LLM
Kai0226/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
Kai0226/PHD
Multi-modality generative foundation models, Parameter efficient fine-tuning, Large language models, Contrastive Language–Image Pre-training, Text-video pre-training
Kai0226/stable-diffusion-webui
Stable Diffusion web UI
Kai0226/UoIDLHealthcare
Deep Learning for Healthcare Specialization
Kai0226/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Kai0226/lazypredict
Lazy Predict help build a lot of basic models without much code and helps understand which models works better without any parameter tuning
Kai0226/LLaVA-Med
Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.
Kai0226/MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Kai0226/MiDaS
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Kai0226/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Kai0226/normal-depth-diffusion
Kai0226/richdreamer
Kai0226/sd-webui-text2video
Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Kai0226/TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Kai0226/Text-To-Video-Finetuning
Finetune ModelScope's Text To Video model using Diffusers 🧨
Kai0226/video-generation-survey
A reading list of video generation
Kai0226/videocomposer
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
Kai0226/VideoDirectorGPT
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Kai0226/WHAM