ksyint

ksyint's Stars

yisol/IDM-VTON
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Language:Python3.9k616
SenHe/Flow-Style-VTON
Language:Python28647
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python4.3k314
boheumd/MA-LMM
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Language:Python24427
fabiogra/moseca
A Streamilt web app for music source separation & karaoke
Language:Python26831
changzy00/pytorch-attention
🦖Pytorch implementation of popular Attention Mechanisms, Vision Transformers, MLP-Like models and CNNs.🔥🔥🔥
Language:Python40136
mamba-org/mamba
The Fast Cross-Platform Package Manager
Language:C++6.9k359
OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Language:Python84060
LeapLabTHU/DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
Language:Python79572
valeoai/PointBeV
Official implementation of PointBeV: A Sparse Approach to BeV Predictions
Language:Python957
Haiyang-W/UniTR
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
Language:Python28416
BAAI-DCAI/SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
Language:Python16311
gazkune/SpatialLM
Language:Python2
cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
Language:Python1078
remyxai/VQASynth
Compose multimodal datasets 🎹
Language:Python21113
OpenDriveLab/ViDAR
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Language:Python27918
prs-eth/Marigold
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Language:Python2.4k132
BobaZooba/xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
Language:Python38121
Lavreniuk/EVP
[ECCV 2024] EVP model for metric depth estimation from a single image and referring segmentation
Language:Jupyter Notebook786
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Language:Python2k158
OpenRobotLab/PointLLM
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
Language:Python64932
Lewis-Stuart-11/3DGS-to-PC
3DGS-to-PC: Convert a 3D gaussian splatting scene into a dense point cloud with advanced customisation options and high-accuracy rendered point colours
Language:Python453
superhero-7/AltDiffusion
Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"
Language:Python352
SusungHong/SEG-SDXL
The implementation of the paper "Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention" (NeurIPS`24)
Language:Jupyter Notebook1033
zhu-xlab/FGMAE
Feature guided masked Autoencoder for self-supervised learning in remote sensing
Language:Jupyter Notebook192
postech-ami/MultiTalk
[INTERSPEECH'24] Official repository for "MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset"
Language:Python748
google-research-datasets/cvss
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
18314
tiankongzhang/NSA
Language:Python7
BakerBunker/FreeV
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Language:Python787
OlaWod/FreeVC
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Language:Python602111