sinhat98
I'm an ML engineer in Japan. My interests are Deep Learning, Speech Processing, and Spoken Dialogue Systems.
CyberAgent, Inc.Tokyo, Sibuya
Pinned Repositories
adapter-wavlm
Aivis
💠 Aivis: AI Voice Imitation System
DialogueMock
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
espnet
End-to-End Speech Processing Toolkit
fastapi-beginner
icefall
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
nishika-competition
nishikaコンペの再現コード
SERwithWavLM
sinhat98's Repositories
sinhat98/adapter-wavlm
sinhat98/nishika-competition
nishikaコンペの再現コード
sinhat98/SERwithWavLM
sinhat98/Aivis
💠 Aivis: AI Voice Imitation System
sinhat98/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
sinhat98/espnet
End-to-End Speech Processing Toolkit
sinhat98/fastapi-beginner
sinhat98/icefall
sinhat98/llm-endpoint
sinhat98/python-dev
sinhat98/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter
sinhat98/skills-secure-code-game
My clone repository
sinhat98/VGGFace2-pytorch
PyTorch Face Recognizer based on 'VGGFace2: A dataset for recognising faces across pose and age'
sinhat98/Style-Bert-VITS2
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.