catherine-qian

catherine-qian's Stars

openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Language:Jupyter Notebook24.7k 322 3923.2k
orpatashnik/StyleCLIP
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
Language:HTML4k 75 118557
facebookresearch/pytorchvideo
A deep learning library for video understanding research.
Language:Python3.3k 156 181405
HobbitLong/SupContrast
PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)
Language:Python3.1k 19 133526
sthalles/SimCLR
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Language:Jupyter Notebook2.2k 20 58459
facebookresearch/TimeSformer
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Language:Python1.5k 27 129209
dandelin/ViLT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Language:Python1.4k 15 92208
facebookresearch/VMZ
VMZ: Model Zoo for Video Modeling
Language:Python1k 109 123155
zczcwh/PoseFormer
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".
Language:Python487 9 3771
DavidDiazGuerra/gpuRIR
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Language:Cuda479 10 5291
hche11/VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
Language:Python285 6 1831
naplab/Conv-TasNet
Language:Python280 14 470
zhengshou/scnn
Segment-CNN: A Framework for Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
Language:Jupyter Notebook235 12 43102
lorenlugosch/end-to-end-SLU
PyTorch code for end-to-end spoken language understanding (SLU) with ASR-based transfer learning
Language:Python223 15 1150
v-iashin/BMT
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Language:Jupyter Notebook223 6 4857
SpikeKing/triplet-loss-mnist
Triplet Loss 损失函数
Language:Python208 6 970
smeetrs/deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Language:Python201 6 4343
adobe-research/DeepAFx
Third-party audio effects plugins as differentiable layers within deep neural networks.
Language:Jupyter Notebook187 9 125
facebookresearch/AVID-CMA
Audio Visual Instance Discrimination with Cross-Modal Agreement
Language:Python127 10 1218
facebookresearch/selavi
This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.
Language:Python114 12 415
Sindhu-Hegde/pseudo-visual-speech-denoising
Official code for the paper "Visual Speech Enhancement Without A Real Visual Stream" published at WACV 2021
Language:Python102 3 924
HumamAlwassel/XDC
Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)
Language:Python90 3 119
YapengTian/AVVP-ECCV20
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
Language:Python77 6 1120
hche11/Localizing-Visual-Sounds-the-Hard-Way
Localizing Visual Sounds the Hard Way
Language:Python76 6 1415
SheldonTsui/PseudoBinaural_CVPR2021
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
Language:Python60 3 812
TaoRuijie/Speaker-Recognition-Demo
A ResNet Speaker Recognition&Verification Demo
Language:Python26 1 37
Yu-Wu/Modaily-Aware-Audio-Visual-Video-Parsing
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
Language:Python24 4 21
vdean/audio-curiosity
Language:Python23 5 43
YapengTian/CCOL-CVPR21
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Language:Python23 4 73
zhang-yw/avn
Language:Python6 2 11