water1905's Stars
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
mozilla/TTS
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
facebookresearch/ConvNeXt
Code release for ConvNeXt model
pengzhiliang/MAE-pytorch
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
michuanhaohao/reid-strong-baseline
Bag of Tricks and A Strong Baseline for Deep Person Re-identification
CoinCheung/pytorch-loss
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
bytedance/music_source_separation
haoheliu/voicefixer
General Speech Restoration
KinWaiCheuk/nnAudio
Audio processing by using pytorch 1D convolution network
ildoonet/pytorch-gradual-warmup-lr
Gradually-Warmup Learning Rate Scheduler for PyTorch
justinsalamon/audio_to_midi_melodia
Extract the melody from an audio file and export to MIDI
wenet-e2e/WenetSpeech
A 10000+ hours dataset for Chinese speech recognition
andreasveit/densenet-pytorch
A PyTorch Implementation for Densely Connected Convolutional Networks (DenseNets)
macosforge/alac
The Apple Lossless Audio Codec (ALAC) is a lossless audio codec developed by Apple and deployed on all of its platforms and devices.
meinardmueller/libfmp
libfmp - Python package for teaching and learning Fundamentals of Music Processing (FMP)
mimbres/neural-audio-fp
facebookresearch/BinauralSpeechSynthesis
N/A
DTennant/reid_baseline_with_syncbn
Reimplementation of Bag of Tricks and A Strong Baseline for Deep Person Re-identification
wq2012/VoiceIdentityBook
《声纹技术:从核心算法到工程实践》
SoundScapeRenderer/ssr
Main source code repository for the SoundScape Renderer
Apm5/ImageNet_ResNet_Tensorflow2.0
Train ResNet on ImageNet in Tensorflow 2.0; ResNet 在ImageNet上完整训练代码
seongmin-kye/meta-SR
Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
polarch/Array-Response-Simulator
A set of routines that simulate array responses for sensors with arbitrary geometry and directional characteristics.
zafarrafii/CQHC-Python
Constant-Q harmonic coefficients (CQHCs), a timbre feature designed for music signals.
JensAhrens/soundfieldsynthesis
Matlab code for the book "Analytic Methods of Sound Field Synthesis"
AME430/Towards-Training-Explainable-Singing-Quality-Assessment-Network-with-Augmented-Data
Codes for paper -- Towards Training Explainable Singing Quality Assessment Network with Augmented Data
seongmin-kye/CAP
Cross attentive pooling for speaker verification (IEEE SLT, 2021)
shanwangshan/Low-latency_deep_clustering_for_speech_separation