liuguoyou's Stars
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
InstantID/InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
HumanAIGC/OutfitAnyone
Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
tryolabs/norfair
Lightweight Python library for adding real-time multi-object tracking to any detector.
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
open-mmlab/PIA
[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
Ucas-HaoranWei/Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
DavidZhangdw/Visual-Tracking-Development
Visual Object Tracking
apple/ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
oneTaken/Awesome-Denoise
One-paper-one-short-contribution-summary of all latest image/burst/video Denoising papers with code & citation published in top conference and journal.
xinghaochen/TinySAM
Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"
mindspore-lab/mindone
one for all, Optimal generator with No Exception
xushilin1/RAP-SAM
zzh-tech/InterpAny-Clearer
Clearer anytime frame interpolation & Manipulated interpolation of anything
XavierCHEN34/LivePhoto
Official implementations for paper: LivePhoto: Real Image Animation with Text-guided Motion Control
aim-uofa/AutoStory
luosiallen/Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
mulab-mir/song-describer-dataset
The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.
liuxubo717/SimPFs
Code for "Simple Pooling Front-ends for Efficient Audio Calssification", ICASSP 2023
haoyi-duan/DG-SCT
NeurIPS'2023 official implementation code
xyongLu/SBCFormer
[Pytorch Impl.] SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers -WACV2024 -Official Code
tomchen-ctj/OST
【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
tany0699/FMViT
Jason-Qiu/MMSum_model
[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
J911/MISO-VFI
Official implementation of "A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow"
Sosdatasets/SoS_Dataset
SCZwangxiao/RTQ-MM2023
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
shantistewart/Emo-CLIM
Emo-CLIM: Emotion-Aligned Contrastive Learning Between Images and Music [ICASSP 2024]
saxenarohit/select_summ