vicident's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
slatedocs/slate
Beautiful static documentation for your API
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
mozilla/DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
w-okada/voice-changer
リアルタイムボイスチェンジャー Realtime Voice Changer
NVIDIA/nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
ExistentialAudio/BlackHole
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
khangich/machine-learning-interview
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
apache/zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
sigoden/aichat
All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.
plaidml/plaidml
PlaidML is a framework for making deep learning work everywhere.
mindee/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
andabi/deep-voice-conversion
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
DeviceFarmer/stf
Control and manage Android devices from your browser.
Hitachi-Automotive-And-Industry-Lab/semantic-segmentation-editor
Web labeling tool for bitmap images and point clouds
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
stypr/clubhouse-py
Clubhouse API written in Python. Standalone client included. For reference and education purposes only.
Calamari-OCR/calamari
Line based ATR Engine based on OCRopy
toy/blueutil
CLI for bluetooth on OSX: power, discoverable state, list, inquire devices, connect, info, …
YuvalNirkin/fsgan
FSGAN - Official PyTorch Implementation
gabrielmittag/NISQA
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
hyperledger-archives/indy-sdk
indy-sdk
markovka17/dla
Deep learning for audio processing
r9y9/gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
andimarafioti/florence2-finetuning
Quick exploration into fine tuning florence 2
hujinsen/StarGAN-Voice-Conversion
full tensorflow implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks https://arxiv.org/abs/1806.02169
HamadYA/GhostFaceNets
This repository contains the official implementation of GhostFaceNets, State-Of-The-Art lightweight face recognition models.
webrtcHacks/WebRTC-Camera-Resolution
WebRTC Camera Resolution Finder