Pinned Repositories
3D-convolutional-speaker-recognition
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
btp-tts
Android app for converting telugu text to speech(TTS).
correction-text
a program that open a text file and correct some punctuation mistakes
Hindi-Spell-Check-Using-Language-Modelling
This project is to provide spell check help from Urdu to Hindi transliteration.The spelling errors in our case mostly comprises of errors in matras.
kaldi-active-grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
kaldi-nlp
Kaldi Speech Recognition Toolkit for NLP task
pos_blstm
Chinese POS tagger
transformer-cnn-emotion-recognition
Speech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between
rohithkodali's Repositories
rohithkodali/transformer-cnn-emotion-recognition
Speech Emotion Classification with novel Parallel CNN-Transformer model built with PyTorch, plus thorough explanations of CNNs, Transformers, and everything in between
rohithkodali/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
rohithkodali/ConsistencyVC-voive-conversion
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
rohithkodali/conv-emotion
This repo contains implementation of different architectures for emotion recognition in conversations
rohithkodali/ddsp
DDSP: Differentiable Digital Signal Processing
rohithkodali/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
rohithkodali/FastSAM
Fast Segment Anything
rohithkodali/langchain
⚡ Building applications with LLMs through composability ⚡
rohithkodali/langdetect
langauge detection algorithm that can be expandable to add any number of languages
rohithkodali/LookOnceToHear
A novel human-interaction method for real-time speech extraction on headphones.
rohithkodali/melgan-neurips
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
rohithkodali/MLnotebook
Understanding Deep Learning - Simon J.D. Prince
rohithkodali/Nepali-Ai-Anchor
Nepali AI Anchor Using LSTM & Pix2Pix. [ Itonics Hackathon 2019]
rohithkodali/PhonoQ
PhonoQ is a deep learning model used to compute phonetic-based features related to duration, rate, rhythm*, and goodness of pronunciation* of 18 phonological classes
rohithkodali/pifuhd
High-Resolution 3D Human Digitization from A Single Image.
rohithkodali/proneval
Koel Labs innovates real-time pronunciation feedback for language learners! This repo contains the ML training, evaluation, and data processing code
rohithkodali/Real-time-wake-word-detection
Spoken wake-word detection for conversational avatar
rohithkodali/recurrent-interface-network-pytorch
Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch
rohithkodali/Resemblyzer
A python package to analyze and compare voices with deep learning
rohithkodali/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
rohithkodali/StyleTTS
Official Implementation of StyleTTS
rohithkodali/supervoice-dataset
60k hours of phoneme-aligned audio from audio books
rohithkodali/TOI
Toi news
rohithkodali/ULCA-asr-dataset-corpus
rohithkodali/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E, WIP
rohithkodali/voice-activity-detection
Voice Activity Detection (VAD) using deep learning.
rohithkodali/VoskIdentification
Тестовый пример задействования модели для идентификации голоса с помощью библиотеки распознавания речи "Vosk" (Воск): https://alphacephei.com/vosk/
rohithkodali/Whisper-Hindi-ASR-model-IIT-Bombay-Intership
The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text.
rohithkodali/whisper-to-normal-speech-conversion
Whisper-to-Normal Speech Conversion Using Generative Adversarial Networks
rohithkodali/you-only-hear-once