ksyint's Stars
yisol/IDM-VTON
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
SenHe/Flow-Style-VTON
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
boheumd/MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
fabiogra/moseca
A Streamilt web app for music source separation & karaoke
changzy00/pytorch-attention
🦖Pytorch implementation of popular Attention Mechanisms, Vision Transformers, MLP-Like models and CNNs.🔥🔥🔥
mamba-org/mamba
The Fast Cross-Platform Package Manager
OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
LeapLabTHU/DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
valeoai/PointBeV
Official implementation of PointBeV: A Sparse Approach to BeV Predictions
Haiyang-W/UniTR
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
BAAI-DCAI/SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
gazkune/SpatialLM
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
remyxai/VQASynth
Compose multimodal datasets 🎹
OpenDriveLab/ViDAR
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
prs-eth/Marigold
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
BobaZooba/xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
Lavreniuk/EVP
[ECCV 2024] EVP model for metric depth estimation from a single image and referring segmentation
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
OpenRobotLab/PointLLM
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
Lewis-Stuart-11/3DGS-to-PC
3DGS-to-PC: Convert a 3D gaussian splatting scene into a dense point cloud with advanced customisation options and high-accuracy rendered point colours
superhero-7/AltDiffusion
Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"
SusungHong/SEG-SDXL
The implementation of the paper "Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention" (NeurIPS`24)
zhu-xlab/FGMAE
Feature guided masked Autoencoder for self-supervised learning in remote sensing
postech-ami/MultiTalk
[INTERSPEECH'24] Official repository for "MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset"
google-research-datasets/cvss
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
tiankongzhang/NSA
BakerBunker/FreeV
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
OlaWod/FreeVC
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion