freds0
Researcher in the area of NLP, Ph.D. student at UFG, focusing on speech synthesis and recognition using deep learning and also professor at UFMT.
UFMTCuiabá, Mato Grosso - Brazil
Pinned Repositories
BSpeech-MOS-Prediction
A model for predicting MOS that utilizes embeddings of supervised learning and self-supervised learning models, combined with embeddings of speaker verification models, to predict the MOS metric.
capybara_dataset
This is a dataset composed of images of capybaras to be used for training a model for object detection
CML-TTS-Dataset
CML-TTS: A Multilingual Dataset for Speech Synthesis
CML-TTS-Toolkit
CML-TTS Conversion Tools
data_augmentation_for_asr
A set of audio augmentation techniques to perform noise insertion in datasets used for Automatic Speech Recognition.
fault_detection_power_transmission_lines
Tensorflow Object Detection API for fault detection at power transmission lines.
kabooks
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using audiobooks, KABooks will generate dataset with segmented audios and aligned texts.
katube
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.
PTL-AI_Furnas_Dataset
PTL-AI Furnas Dataset: A Public Dataset for Fault Detection in Power Transmission Lines Using Aerial Images
useful_audio_scripts
Some useful scripts for audio
freds0's Repositories
freds0/CML-TTS-Dataset
CML-TTS: A Multilingual Dataset for Speech Synthesis
freds0/katube
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.
freds0/BSpeech-MOS-Prediction
A model for predicting MOS that utilizes embeddings of supervised learning and self-supervised learning models, combined with embeddings of speaker verification models, to predict the MOS metric.
freds0/useful_audio_scripts
Some useful scripts for audio
freds0/capybara_dataset
This is a dataset composed of images of capybaras to be used for training a model for object detection
freds0/BRSpeech-Dataset
BRSpeech: A Portuguese Dataset for Speech Synthesis
freds0/speaker_clustering
freds0/CleanSpecNet
freds0/CML-TTS-Toolkit
CML-TTS Conversion Tools
freds0/overlapping_voices_detector
freds0/tacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inference adapted for brazilian portuguese.
freds0/CleanUNet2
freds0/hifi-gan2
freds0/free-svc
freds0/CleanUNet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
freds0/coqui-TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
freds0/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
freds0/ermis_demo
freds0/freds0
freds0/freds0.github.io
A simple Github Pages template for academic personal websites.
freds0/FullSubNet-plus
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
freds0/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
freds0/Multilingual-PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
freds0/RVC-Demo
freds0/Train_Hifigan_XTTS
This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
freds0/UTMOS
freds0/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
freds0/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
freds0/xtts-webui
Webui for using XTTS and for finetuning it
freds0/XTTSv2-Finetuning-for-New-Languages