entn-at

Ph.D. EE (UNSW Sydney). ML, speaker recognition, speech recognition, speech synthesis, forensic voice comparison

Portland, Oregon

Pinned Repositories

acc-tacotron2
Implementation of Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency.
Language:Python1 2 01
alpaca-lora
Code for reproducing the Stanford Alpaca InstructLLaMA result on consumer hardware
Language:Jupyter Notebook1 0 00
ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Language:Jupyter Notebook1 0 00
common-voice-kaldi
Multi-accent ASR for Mozilla Common Voice using Kaldi
Language:Shell2 1 01
DiffProsody
Language:Python20
DurIAN-1
Implementation of "DurIAN: Duration Informed Attention Network For Multimodal Synthesis".
Language:Python14 2 015
nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
Language:Python2 2 00
tf-kaldi-speaker
An integration of Kaldi and Tensorflow to train a neural network based speaker verification system.
Language:Python8 2 054
Wav2Vec-TTS
Language:Python3 1 00
whisper
Language:Python2 0 00

entn-at's Repositories

entn-at/Wav2Vec-TTS
Language:Python3 1 00
entn-at/ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Language:Jupyter Notebook1 0 00
entn-at/OpenVoice
Instant voice cloning
Language:Python1 0 0
entn-at/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
1
entn-at/agc
Audiogen Codec
Language:Python0 0
entn-at/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python0 0
entn-at/BaySMM
Language:Python1 0
entn-at/DCA-PLDA
Discriminative Condition-Aware PLDA
Language:Python1 0
entn-at/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Language:Python0 0
entn-at/flutter_onnx
ONNX runtime plugin for Flutter
Language:C++0 0
entn-at/flutter_sherpa_onnx
Flutter plugin wrapping the Sherpa-ONNX runtime
Language:Dart0 0
entn-at/FRA-RIR
Language:Python0 0
entn-at/gazelle-train
Joint speech-language model - respond directly to audio!
entn-at/hilcodec
entn-at/languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
entn-at/last
A JAX library for building lattice-based speech transducer models
Language:Python0 0
entn-at/pyannote-audio_overlapped-speech-detection_cpp
C++ version of pyannote audio overlapped speech detection pipeline
Language:Python0 0
entn-at/pyannote-onnx
Language:C++0 0
entn-at/rustfst
Rust library for Weighted Finite States Transducers as decribed by Mohri and Allauzen
Language:Rust2 0
entn-at/stable-ts
Timestamping Spoken Words
Language:Python0 0
entn-at/Toroidal-PSDA
A probabilistic scoring backend for length-normalized embeddings.
Language:Python
entn-at/Transformer-TTS-V2
entn-at/Triton-Puzzles
Puzzles for learning Triton
entn-at/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
Language:Python0 0
entn-at/utut
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
entn-at/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
entn-at/VoiceFlow-TTS
Language:Python
entn-at/voxangeles
VoxAngeles Corpus
entn-at/WhisperKit
Swift native on-device speech recognition with Whisper for Apple Silicon
entn-at/whisperkittools
Python tools for WhisperKit: Model conversion, optimization and evaluation