byby666's Stars
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
lllyasviel/Omost
Your image is almost there!
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
mhamilton723/FeatUp
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
wysoczanska/clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
prs-eth/Marigold
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
vvictoryuki/AnimateZero
Official PyTorch implementation for the paper "AnimateZero: Video Diffusion Models are Zero-Shot Image Animators"
ChenyangSi/FreeU
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
harlanhong/ICCV2023-MCNET
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
paulpanwang/POPE
Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
bcmi/DCI-VTON-Virtual-Try-On
[ACM Multimedia 2023] Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow.
cure-lab/PnPInversion
[ICLR2024] Official repo for paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"
sled-group/CycleNet
Official Code for NeurIPS 2023 Paper: CycleNet: Rethinking Cycle Consistent in Text‑Guided Diffusion for Image Manipulation
caiyuanhao1998/Retinexformer
"Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement" (ICCV 2023) & (NTIRE 2024 Challenge)
tjiiv-cprg/EPro-PnP
[CVPR 2022 Oral, Best Student Paper] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
shangbuhuan13/SO-Pose
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation
Zheng-Chong/FashionMatrix
Fashion Matrix is dedicated to bridging various visual and language models and continuously refining its capabilities as a comprehensive fashion AI assistant. This project will continue to update new features and optimization effects.
cuiziteng/Illumination-Adaptive-Transformer
🌕 [BMVC 2022] You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction. SOTA for low light enhancement, 0.004 seconds try this for pre-processing.
JianghaiSCU/R2RNet
Official code of "R2RNet: Low-light Image Enhancement Via Real-low to Real-normal Network".
qiuyu96/CoDeF
[CVPR 2024 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
SizheAn/PanoHead
Code Repository for CVPR 2023 Paper "PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 degree"
williamyang1991/StyleGANEX
[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
jianzongwu/Awesome-Open-Vocabulary
(TPAMI 2024) A Survey on Open Vocabulary Learning
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Meta-Portrait/MetaPortrait
[CVPR 2023] MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
facebookresearch/ijepa
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding