freds0

Researcher in the area of NLP, Ph.D. student at UFG, focusing on speech synthesis and recognition using deep learning and also professor at UFMT.

UFMTCuiabá, Mato Grosso - Brazil

Pinned Repositories

BSpeech-MOS-Prediction
A model for predicting MOS that utilizes embeddings of supervised learning and self-supervised learning models, combined with embeddings of speaker verification models, to predict the MOS metric.
Language:Python8 4 00
capybara_dataset
This is a dataset composed of images of capybaras to be used for training a model for object detection
Language:Python7 1 06
CML-TTS-Dataset
CML-TTS: A Multilingual Dataset for Speech Synthesis
Language:HTML29 2 02
CML-TTS-Toolkit
CML-TTS Conversion Tools
Language:Python4 2 01
data_augmentation_for_asr
A set of audio augmentation techniques to perform noise insertion in datasets used for Automatic Speech Recognition.
Language:Python35 2 05
fault_detection_power_transmission_lines
Tensorflow Object Detection API for fault detection at power transmission lines.
Language:Python9 1 13
kabooks
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using audiobooks, KABooks will generate dataset with segmented audios and aligned texts.
Language:Python11 3 04
katube
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.
Language:Python22 2 14
PTL-AI_Furnas_Dataset
PTL-AI Furnas Dataset: A Public Dataset for Fault Detection in Power Transmission Lines Using Aerial Images
16 1 06
useful_audio_scripts
Some useful scripts for audio
Language:Python8 1 03

freds0's Repositories

freds0/CML-TTS-Dataset
CML-TTS: A Multilingual Dataset for Speech Synthesis
Language:HTML29 2 02
freds0/katube
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.
Language:Python22 2 14
freds0/BSpeech-MOS-Prediction
A model for predicting MOS that utilizes embeddings of supervised learning and self-supervised learning models, combined with embeddings of speaker verification models, to predict the MOS metric.
Language:Python8 4 00
freds0/useful_audio_scripts
Some useful scripts for audio
Language:Python8 1 03
freds0/capybara_dataset
This is a dataset composed of images of capybaras to be used for training a model for object detection
Language:Python7 1 06
freds0/BRSpeech-Dataset
BRSpeech: A Portuguese Dataset for Speech Synthesis
Language:CSS6 2 10
freds0/speaker_clustering
Language:Python5 1 02
freds0/CleanSpecNet
Language:Python41
freds0/CML-TTS-Toolkit
CML-TTS Conversion Tools
Language:Python4 2 01
freds0/overlapping_voices_detector
Language:Python4 1 02
freds0/tacotron2
Tacotron 2 - PyTorch implementation with faster-than-realtime inference adapted for brazilian portuguese.
Language:Jupyter Notebook4 1 03
freds0/CleanUNet2
Language:Python2
freds0/hifi-gan2
Language:Python1 2 0
freds0/free-svc
Language:Python0 1 20
freds0/CleanUNet
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
Language:Python
freds0/coqui-TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python0 0
freds0/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
freds0/ermis_demo
Language:Python1 01
freds0/freds0
1 0
freds0/freds0.github.io
A simple Github Pages template for academic personal websites.
Language:CSS0 0
freds0/FullSubNet-plus
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
Language:Python0 0
freds0/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Language:Python
freds0/Multilingual-PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Language:Python0 0
freds0/RVC-Demo
Language:Python1 0
freds0/Train_Hifigan_XTTS
This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
Language:Python0 0
freds0/UTMOS
Language:Python1 0
freds0/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
Language:Python0 0
freds0/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook0 0
freds0/xtts-webui
Webui for using XTTS and for finetuning it
freds0/XTTSv2-Finetuning-for-New-Languages