catherine-qian

catherine-qian's Stars

facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Language:Python8.3k 153 5421.1k
NVIDIA/waveglow
A Flow-based Generative Network for Speech Synthesis
Language:Python2.3k 77 257530
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Language:Python2.3k 46 398485
LCAV/pyroomacoustics
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Language:Python1.4k 42 230432
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Language:Jupyter Notebook1.1k 17 136214
maum-ai/voicefilter
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Language:Python1.1k 35 26226
NVlabs/Dancing2Music
Language:Python532 43 2585
amirbar/speech2gesture
code for training the models from the paper "Learning Individual Styles of Conversational Gestures"
Language:Python373 27 2544
facebookresearch/meshtalk
Code for MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
Language:Python368 12 5156
Edresson/VoiceSplit
VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram
Language:Python222 8 1132
galgreshler/Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
Language:Python187 4 735
facebookresearch/BinauralSpeechSynthesis
N/A
Language:Python165 20 319
facebookresearch/2.5D-Visual-Sound
2.5D visual sound
Language:Python110 9 621
facebookresearch/EasyComDataset
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
106 10 77
BUTSpeechFIT/speakerbeam
Language:Jupyter Notebook100 6 318
facebookresearch/FAIR-Play
2.5D visual sound dataset
91 9 413
ChenDelong1999/VirtualConductor
🎶 Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)
Language:Python90 4 612
thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
Language:Python77 2 1412
hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge
Language:Python54 10 412
PeihaoChen/regnet
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
Language:Python50 1 1012
marmot-xy/CMBS
cross modal background suppression for audio-visual event localization
Language:Python34 1 56
khdlr/SoundingEarth
Self-supervised Audiovisual Representation Learning for Remote Sensing Data
Language:Python30 3 24
mshukor/TFood
[CVPRW22] Official Implementation of T-Food: "Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval". Accepted at CVPR22 's MULA Workshop.
Language:HTML29 3 96
facebookresearch/learning-audio-visual-dereverberation
Code for paper Learning Audio-Visual Dereverberation
Language:Python26 8 35
andimarafioti/audioContextEncoder
A context encoder for audio inpainting
Language:Jupyter Notebook25 8 172
andimarafioti/GACELA
Generative adversarial context encoder for audio inpainting
Language:Jupyter Notebook24 5 33
asudahkzj/Wnet
Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks
Language:Python21 1 10
ISmallFish/Libri-adhoc40
A dataset collected from synchronized ad-hoc microphone arrays
16 2 22
l3das/L3DAS23
Official repository supporting the L3DAS23 IEEE ICASSP Grand Challenge
Language:Python16 1 24
SAGNIKMJR/move2hear-active-AV-separation
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
Language:Python13 3 60