DanielLin94144
Ph.D. @ NTU Speech Processing and Machine Learning Laboratory. Deep Learning for Speech Processing.
National Taiwan UniversityTaiwan
DanielLin94144's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
bentoml/OpenLLM
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
mistralai/mistral-inference
Official inference library for Mistral models
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Tiiiger/bert_score
BERT score for text generation
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
tim-learn/awesome-test-time-adaptation
Collection of awesome test-time (domain/batch/instance) adaptation methods
microsoft/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
jxmorris12/language_tool_python
a free python grammar checker 📝✅
neelsjain/NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
dynamic-superb/dynamic-superb
The official repository of Dynamic-SUPERB.
pln-fing-udelar/fast-krippendorff
Fast computation of Krippendorff's alpha agreement measure in Python.
emo-box/EmoBox
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
line/LibriTTS-P
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
voidism/Lookback-Lens
Official implementation for the paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
roudimit/whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
mtkresearch/generative-fusion-decoding
Generative Fusion Decoding (GFD) is a novel framework for integrating Large Language Models (LLMs) into multi-modal text recognition systems like ASR and OCR, improving performance and efficiency by enabling seamless fusion without requiring re-training.
DanielLin94144/StyleTalk
Official release of StyleTalk dataset.
JasonSWFu/VQscore
d223302/A-Closer-Look-To-LLM-Evaluation
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
interactiveaudiolab/emphases
Crowdsourced and Automatic Speech Prominence Estimation
kuan2jiu99/audio-hallucination
Interspeech2024 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
roger-tseng/CodecFake
A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024
google-research-datasets/LLAMA1-Test-Set
We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We prompted the open-source LLama-7B model for questions and short answers on various topics. We gathered 300 questions (with Google Cloud TTS service, voice en-US-Neural2-C), and generally verified the answers.