ajd12342
PhD CS, UT Austin. Prev. B.Tech CS, IIT Bombay. Working on ASR and NLP. I love writing clean, documented code.
UT AustinAustin, Texas
ajd12342's Stars
xai-org/grok-1
Grok open release
koalaman/shellcheck
ShellCheck, a static analysis tool for shell scripts
2noise/ChatTTS
A generative speech model for daily dialogue.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
dottxt-ai/outlines
Structured Text Generation
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
huggingface/parler-tts
Inference and training library for high-quality TTS models.
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
THUNLP-MT/MT-Reading-List
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
rsennrich/subword-nmt
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
mut-ex/gligen-gui
An intuitive GUI for GLIGEN that uses ComfyUI in the backend
AkariAsai/self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
haoheliu/voicefixer
General Speech Restoration
SpeechColab/GigaSpeech
Large, modern dataset for speech recognition
liusongxiang/Large-Audio-Models
Keep track of big models in audio domain, including speech, singing, music etc.
JarodMica/audiobook_maker
huggingface/dataspeech
jishengpeng/TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
gmltmd789/UnitSpeech
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
IDRnD/VoxTube
The VoxTube dataset official repository
vectominist/spin
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
raj-sutariya/indic-num2words
Python library for converting numbers to words for all Indian Languages.
skit-ai/slu-prosody
Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 2023.
amazon-science/synthesizrr
Synthesizing realistic and diverse text-datasets from augmented LLMs
kaistmm/voxsim_trainer
[INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset
GussailRaat/Devanagari-Hindi-Language-in-pdfLatex
Easy Steps to write Devanagari (Hindi) Language in pdfLatex