xiaosdawn's Stars
SamuraiT/mecab-python3
:snake: mecab-python. you can find original version here:http://taku910.github.io/mecab/
taishi-i/awesome-japanese-nlp-resources
A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
Srijith-rkr/Whispering-LLaMA
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
LlamaFamily/Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
qcri/ArabicASRChallenge2016
This repository
sarulab-speech/whisper-asr-finetune
Picovoice/speech-to-text-benchmark
speech to text benchmark framework
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
theajack/cnchar
🇨🇳 功能全面的汉字工具库 (拼音 笔画 偏旁 成语 语音 可视化等) (Chinese character util)
kfcd/chaizi
漢語拆字字典
howl-anderson/hanzi_chaizi
汉字拆字库,可以将汉字拆解成偏旁部首,在机器学习中作为汉字的字形特征 | Hanzi Decomposition Library allows Chinese characters to be broken down into radicals and components, which can be used as character shape features in machine learning.
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
cjkvi/cjkvi-ids
IDS data for CJK Unified Ideographs
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
clab/fast_align
Simple, fast unsupervised word aligner
SpeechColab/Leaderboard
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
Baidu-AIP/speech-demo
语音api示例
HLTCHKUST/ASCEND
ASCEND Chinese-English code-switching dataset
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
BYVoid/OpenCC
Conversion between Traditional and Simplified Chinese
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
savoirfairelinux/num2words
Modules to convert numbers to words. 42 --> forty-two
xiaoyjy/cconv
A iconv based simplified-traditional chinese conversion tool
ExistentialAudio/BlackHole
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
kpu/preprocess
Corpus preprocessing