edwin-19
Machine Learning Engineer with a focused on deep learning and deploying deep learning models. I specialise in computer vision, NLP, Speech and Multimodal AI
Malaysia
edwin-19's Stars
microsoft/vscode
Visual Studio Code
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
danielmiessler/fabric
fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
unslothai/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
fishaudio/fish-speech
Brand new TTS solution
neuml/txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
pydantic/FastUI
Build better UIs faster.
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
google/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
google/gemma_pytorch
The official PyTorch implementation of Google's Gemma models
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
JusperLee/Conv-TasNet
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
fh2019ustc/Awesome-Document-Image-Rectification
A comprehensive list of awesome document image rectification papers.
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
huggingface/dataspeech
NVIDIA-AI-IOT/nanoowl
A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
isaaccorley/torchseg
Segmentation models with pretrained backbones. PyTorch.
interactiveaudiolab/ppgs
High-Fidelity Neural Phonetic Posteriorgrams
NeuralVox/OpenPhonemizer
An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPL phonemizer.
hitachi-nlp/appjsonify
A handy PDF-to-JSON conversion tool for academic papers implemented in Python.
WangHelin1997/Aty-TTS
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
ETZET/SpeechEmotionAVLearning