DanielLin94144

Ph.D. @ NTU Speech Processing and Machine Learning Laboratory. Deep Learning for Speech Processing.

National Taiwan UniversityTaiwan

DanielLin94144's Stars

2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python31.1k 179 5173.4k
bentoml/OpenLLM
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
Language:Python9.8k 55 267624
mistralai/mistral-inference
Official inference library for Mistral models
Language:Jupyter Notebook9.6k 124 142846
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Language:Python7.3k 63 150621
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python6.6k 37 1.1k1.7k
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python6.1k 58 1.1k656
Tiiiger/bert_score
BERT score for text generation
Language:Jupyter Notebook1.6k 22 157211
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
Language:Python1.1k 138 9177
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.1k 31 6166
tim-learn/awesome-test-time-adaptation
Collection of awesome test-time (domain/batch/instance) adaptation methods
688 18 148
microsoft/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Language:HTML477 20 15146
jxmorris12/language_tool_python
a free python grammar checker 📝✅
Language:Python425 10 7559
neelsjain/NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
Language:Python370 11 1418
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Language:Python212 20 211
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Language:Python173 6 119
dynamic-superb/dynamic-superb
The official repository of Dynamic-SUPERB.
Language:Python149 6 16790
pln-fing-udelar/fast-krippendorff
Fast computation of Krippendorff's alpha agreement measure in Python.
Language:Python134 13 1615
emo-box/EmoBox
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Language:Python126 4 15
facebookresearch/ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
Language:Python126 6 57
line/LibriTTS-P
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
109 8 32
voidism/Lookback-Lens
Official implementation for the paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
Language:Python96 3 64
roudimit/whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Language:Jupyter Notebook69 2 13
mtkresearch/generative-fusion-decoding
Generative Fusion Decoding (GFD) is a novel framework for integrating Large Language Models (LLMs) into multi-modal text recognition systems like ASR and OCR, improving performance and efficiency by enabling seamless fusion without requiring re-training.
Language:Python61 8 48
DanielLin94144/StyleTalk
Official release of StyleTalk dataset.
54 7 12
JasonSWFu/VQscore
Language:Python30 2 33
d223302/A-Closer-Look-To-LLM-Evaluation
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
Language:Python16 1 00
interactiveaudiolab/emphases
Crowdsourced and Automatic Speech Prominence Estimation
Language:Python11 5 52
kuan2jiu99/audio-hallucination
Interspeech2024 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Language:Python11 3 10
roger-tseng/CodecFake
A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024
11 2 0
google-research-datasets/LLAMA1-Test-Set
We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We prompted the open-source LLama-7B model for questions and short answers on various topics. We gathered 300 questions (with Google Cloud TTS service, voice en-US-Neural2-C), and generally verified the answers.
7 3 02

DanielLin94144

DanielLin94144's Stars

2noise/ChatTTS

bentoml/OpenLLM

mistralai/mistral-inference

netease-youdao/EmotiVoice

EleutherAI/lm-evaluation-harness

modelscope/FunASR

Tiiiger/bert_score

Text-to-Audio/AudioLCM

QwenLM/Qwen2-Audio

tim-learn/awesome-test-time-adaptation

microsoft/MS-SNSD

jxmorris12/language_tool_python

neelsjain/NEFTune

OpenT2S/LlamaVoice

NVIDIA/audio-flamingo

dynamic-superb/dynamic-superb

pln-fing-udelar/fast-krippendorff

emo-box/EmoBox

facebookresearch/ears_dataset

line/LibriTTS-P

voidism/Lookback-Lens

roudimit/whisper-flamingo

mtkresearch/generative-fusion-decoding

DanielLin94144/StyleTalk

JasonSWFu/VQscore

d223302/A-Closer-Look-To-LLM-Evaluation

interactiveaudiolab/emphases

kuan2jiu99/audio-hallucination

roger-tseng/CodecFake

google-research-datasets/LLAMA1-Test-Set