jkoprax

jkoprax's Stars

suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook36.4k 330 4464.3k
mozilla/DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Language:C++25.5k 673 2.1k4k
microsoft/semantic-kernel
Integrate cutting-edge LLM technology quickly and easily into your apps
Language:C#22.3k 273 3.4k3.3k
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Language:Shell14.4k 692 1.7k5.3k
espnet/espnet
End-to-End Speech Processing Toolkit
Language:Python8.6k 181 2.4k2.2k
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Language:Jupyter Notebook6.5k 73 997797
mravanelli/pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Language:Python2.4k 94 214446
gcui-art/suno-api
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
Language:TypeScript1.5k 37 150362
bbernhard/signal-cli-rest-api
Dockerized Signal Messenger REST API
Language:Go1.4k 27 481164
modelscope/3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Language:Python1.3k 17 113107
juanmc2005/diart
A python package to build AI-powered real-time audio applications
Language:Python1.1k 22 15290
transcriptionstream/transcriptionstream
turnkey self-hosted offline transcription and diarization service with llm summary
Language:Python756 8 1642
audeering/opensmile
The Munich Open-Source Large-Scale Multimedia Feature Extractor
Language:C++610 19 6877
wq2012/SpectralCluster
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
Language:Python515 19 4573
hitachi-speech/EEND
End-to-End Neural Diarization
Language:Python378 17 4659
raffael/SISinusWaveView
A Siri like voice input visualizer using EZAudio.
Language:Objective-C275 11 333
algolia/voice-overlay-android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Language:Kotlin256 10 436
flowese/UdioWrapper
UdioWrapper is a Python package that enables the generation of music tracks using Udio's API through textual prompts. This package is based on the reverse engineering of the Udio API (https://www.udio.com/) and is not officially endorsed by Udio.
Language:Python140 4 1526
klinker24/wearable-reply
Simplify text input for Android Wear 2.0, by voice, keyboard, or canned response.
Language:Java120 6 318
bedangSen/VoiceSens
A Voice Biometric Application using Watson Speech to Text
Language:JavaScript82 6 2628
open-webui/extension
WIP: Open WebUI Chrome Extension (Requires Open WebUI v0.2.0+)
Language:Svelte74 2 223
Raymo111/voiceprint
Voice biometric authentication PAM module for Linux
Language:Python41 7 210
NaveedShahid/Voice-Authentication-CNN
Voice authentication system implementation using Python
Language:Python37 3 319
open-webui/assistant
No longer actively being worked on, Please use https://github.com/open-webui/extension instead
Language:TypeScript27 2 28
itmo-mbss-lab/sr_labs_book
The project is related to the development of labs for the ITMO Speaker Recognition Course.
Language:Jupyter Notebook10 1 08
IDRnD/idvoice-gpt-ios-demo
IDVoice + ChatGPT iOS demo app
Language:Swift7 0 00
conceptslabs/thought_factory
Language:Python3 1 02
schaltung/FishBoardMix
The FishBoardMix corpus is designed to explore Speaker-Age estimation technology.
Language:Shell2 2 00
IDRnD/idvoice-gpt-android-demo
IDVoice + ChatGPT Android demo app
Language:Kotlin1 0 00
jakubbortlik/accent_rating
A collection of scripts and data I used when working on my dissertation
Language:Python1 3 00