catherine-qian's Stars
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
NVIDIA/waveglow
A Flow-based Generative Network for Speech Synthesis
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
LCAV/pyroomacoustics
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
maum-ai/voicefilter
Unofficial PyTorch implementation of Google AI's VoiceFilter system
NVlabs/Dancing2Music
amirbar/speech2gesture
code for training the models from the paper "Learning Individual Styles of Conversational Gestures"
facebookresearch/meshtalk
Code for MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
Edresson/VoiceSplit
VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram
galgreshler/Catch-A-Waveform
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
facebookresearch/BinauralSpeechSynthesis
N/A
facebookresearch/2.5D-Visual-Sound
2.5D visual sound
facebookresearch/EasyComDataset
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
BUTSpeechFIT/speakerbeam
facebookresearch/FAIR-Play
2.5D visual sound dataset
ChenDelong1999/VirtualConductor
🎶 Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)
thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge
PeihaoChen/regnet
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
marmot-xy/CMBS
cross modal background suppression for audio-visual event localization
khdlr/SoundingEarth
Self-supervised Audiovisual Representation Learning for Remote Sensing Data
mshukor/TFood
[CVPRW22] Official Implementation of T-Food: "Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval". Accepted at CVPR22 's MULA Workshop.
facebookresearch/learning-audio-visual-dereverberation
Code for paper Learning Audio-Visual Dereverberation
andimarafioti/audioContextEncoder
A context encoder for audio inpainting
andimarafioti/GACELA
Generative adversarial context encoder for audio inpainting
asudahkzj/Wnet
Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks
ISmallFish/Libri-adhoc40
A dataset collected from synchronized ad-hoc microphone arrays
l3das/L3DAS23
Official repository supporting the L3DAS23 IEEE ICASSP Grand Challenge
SAGNIKMJR/move2hear-active-AV-separation
Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)