entn-at
Ph.D. EE (UNSW Sydney). ML, speaker recognition, speech recognition, speech synthesis, forensic voice comparison
Portland, Oregon
Pinned Repositories
acc-tacotron2
Implementation of Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency.
alpaca-lora
Code for reproducing the Stanford Alpaca InstructLLaMA result on consumer hardware
ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
common-voice-kaldi
Multi-accent ASR for Mozilla Common Voice using Kaldi
DiffProsody
DurIAN-1
Implementation of "DurIAN: Duration Informed Attention Network For Multimodal Synthesis".
nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
tf-kaldi-speaker
An integration of Kaldi and Tensorflow to train a neural network based speaker verification system.
Wav2Vec-TTS
whisper
entn-at's Repositories
entn-at/Wav2Vec-TTS
entn-at/ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
entn-at/OpenVoice
Instant voice cloning
entn-at/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
entn-at/agc
Audiogen Codec
entn-at/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
entn-at/BaySMM
entn-at/DCA-PLDA
Discriminative Condition-Aware PLDA
entn-at/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
entn-at/flutter_onnx
ONNX runtime plugin for Flutter
entn-at/flutter_sherpa_onnx
Flutter plugin wrapping the Sherpa-ONNX runtime
entn-at/FRA-RIR
entn-at/gazelle-train
Joint speech-language model - respond directly to audio!
entn-at/hilcodec
entn-at/languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
entn-at/last
A JAX library for building lattice-based speech transducer models
entn-at/pyannote-audio_overlapped-speech-detection_cpp
C++ version of pyannote audio overlapped speech detection pipeline
entn-at/pyannote-onnx
entn-at/rustfst
Rust library for Weighted Finite States Transducers as decribed by Mohri and Allauzen
entn-at/stable-ts
Timestamping Spoken Words
entn-at/Toroidal-PSDA
A probabilistic scoring backend for length-normalized embeddings.
entn-at/Transformer-TTS-V2
entn-at/Triton-Puzzles
Puzzles for learning Triton
entn-at/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
entn-at/utut
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
entn-at/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
entn-at/VoiceFlow-TTS
entn-at/voxangeles
VoxAngeles Corpus
entn-at/WhisperKit
Swift native on-device speech recognition with Whisper for Apple Silicon
entn-at/whisperkittools
Python tools for WhisperKit: Model conversion, optimization and evaluation