torchaudio
There are 76 repositories under torchaudio topic.
2noise/ChatTTS
A generative speech model for daily dialogue.
DrewThomasson/VoxNovel
VoxNovel: generate audiobooks giving each character a different voice actor.
ujiaqi/MusicRecommend
:star: 本科毕业设计:基于内容的音乐推荐系统设计与开发。使用了Pytorch框架构建训练模型代码,使用Django构建了前后端。
KentoNishi/torch-pitch-shift
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
nipponjo/tts-arabic-pytorch
TTS models for Arabic (Tacotron2, FastPitch)
xucailiang/cascade
Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.
evshiron/rocm_lab
DEPRECATED!
KentoNishi/torch-time-stretch
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
SekiroRong/KAN-AutoEncoder
KAE : KAN-based AutoEncoder (AE, VAE, VQ-VAE, RVQ, etc.)
torchsmoke/Python3-Wheels
Wheels for Python 3
PINTO0309/pytorch4raspberrypi
Cross-compilation of PyTorch armv7l (32bit) for RaspberryPi OS
overcrash66/OpenTranslator
Open Translator: Speech To Speech and Speech to text Translator with voice cloning and other cool features
BakingBrains/Sound_Classification
Sound classification on Urban Sound Dataset
aminul-huq/Speech-Command-Classification
Speech command classification on Speech-Command v0.02 dataset using PyTorch and torchaudio. In this example, three models have been trained using the raw signal waveforms, MFCC features and MelSpectogram features.
eonu/torch-fsdd
A utility for wrapping the Free Spoken Digit Dataset into PyTorch-ready data set splits.
LukeSutor/programmatic-pitch
High fidelity music synthesis using diffusion and UnivNet.
CrispenGari/animal-sound-classification
this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.
nipponjo/tts-german-pytorch
TTS (FastPitch) for German (Thorsten voice / emotional)
CrispenGari/emotionAI
(😞 😨 😄 😮 😍 😠 😐 🤮) This is a simple DL API that classifies human emotions from audios and text.
glefundes/misophonia-bot
🤖 Telegram bot powered by Deep Learning. Automatically assesses the safety of audios and voice messages for people suffering from misophonia.
igorshmukler/kokoro-ruslan
Kokoro Language Model Training Script for Russian (Ruslan Corpus)
pradeepbatchu/speechtotext
Speech to Text with Wav2Vec2 using torchaudio
BaoNguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
LukeSutor/guitar_source_separation
The unmix model trained to separate guitar playing from audio samples using a custom-built dataset.
LumenPallidium/audio_generation
Experiments in neural networks for audio generation.
mehdihosseinimoghadam/Signal-Processing
Signal Processing with Python and Librosa
vectominist/Switchboard-WSJ-Utils
Utilities for preprocessing the Switchboard and WSJ corpora in Python3
JoelDeonDsouza/Auto_CNN
This repo implements a deep learning pipeline for classifying environmental sounds from the ESC-50 dataset.
avrtt/MoE-speech-recognition
Mixture of experts architecture for speech-to-text and language identification, built in PyTorch
CrispenGari/torch-audio
🎶🎼 This repository contains some notebooks that were used to train Audio Classification models in pytorch using torchaudio.
Efenstor/PyTorch-ROCm-gfx1010
Instructions on how to build PyTorch on Debian 12 with support for the AMD gfx1010 architecture
manhph2211/DSP101
Building a speaker identification & verification pipeline for Vietnamese voices :sleepy:
NevroHelios/CrossEmotion
MELD-IFEED-Benchmark
thekartikeyamishra/VoiceCloner
The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.
yangarbiter/torchaudio-benchmark
TorchAudio: Building Blocks for Audio and Speech Processing