catherine-qian's Stars
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
orpatashnik/StyleCLIP
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
facebookresearch/pytorchvideo
A deep learning library for video understanding research.
HobbitLong/SupContrast
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)
sthalles/SimCLR
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
facebookresearch/TimeSformer
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
dandelin/ViLT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
facebookresearch/VMZ
VMZ: Model Zoo for Video Modeling
zczcwh/PoseFormer
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".
DavidDiazGuerra/gpuRIR
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
hche11/VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
naplab/Conv-TasNet
zhengshou/scnn
Segment-CNN: A Framework for Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
lorenlugosch/end-to-end-SLU
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
v-iashin/BMT
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
SpikeKing/triplet-loss-mnist
Triplet Loss 损失函数
smeetrs/deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
adobe-research/DeepAFx
Third-party audio effects plugins as differentiable layers within deep neural networks.
facebookresearch/AVID-CMA
Audio Visual Instance Discrimination with Cross-Modal Agreement
facebookresearch/selavi
This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.
Sindhu-Hegde/pseudo-visual-speech-denoising
Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021
HumamAlwassel/XDC
Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)
YapengTian/AVVP-ECCV20
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
hche11/Localizing-Visual-Sounds-the-Hard-Way
Localizing Visual Sounds the Hard Way
SheldonTsui/PseudoBinaural_CVPR2021
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
TaoRuijie/Speaker-Recognition-Demo
A ResNet Speaker Recognition&Verification Demo
Yu-Wu/Modaily-Aware-Audio-Visual-Video-Parsing
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
vdean/audio-curiosity
YapengTian/CCOL-CVPR21
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
zhang-yw/avn