Pinned Repositories
A-real-time-time-domain-speech-enhancement-model
AEC-ANS-AGC
AEC/ANS/AGC from webrtc
asteroid
The PyTorch-based audio source separation toolkit for researchers || Current highlight : we got our WHAMR results check it out here !
AudioBSS
Blind source seperation of audio records
awesome-vad
A curated list of awesome voice activity detection
essentia
C++ library for audio and music analysis, description and synthesis, including Python bindings
Multi-Channel-Acoustic-Echo-Cancellation
MVDR-Speech-Enhancement
Realtime_AudioDenoise_EchoCancellation
Speech-Separation-Paper-Tutorial
A must-read paper for speech separation based on neural networks
ROAD2018's Repositories
ROAD2018/aiokafka
asyncio client for kafka
ROAD2018/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
ROAD2018/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
ROAD2018/comfyui_LLM_party
Dify in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img
ROAD2018/confluent-kafka-go
Confluent's Apache Kafka Golang client
ROAD2018/confluent-kafka-python
Confluent's Kafka Python Client
ROAD2018/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
ROAD2018/deep-learning-for-image-processing
deep learning for image processing including classification and object-detection etc.
ROAD2018/elasticsearch-py
Official Python client for Elasticsearch
ROAD2018/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
ROAD2018/FunASR
A Fundamental End-to-End Speech Recognition Toolkit
ROAD2018/FunAudioLLM-APP
ROAD2018/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
ROAD2018/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
ROAD2018/GPT-SoVITS-V2
GPT-SoVITS-V2模型,合并了官方的一些PR,包含但不限于:参考音频自动填充,字幕同步,SillyTavern酒馆接入等功能
ROAD2018/HanziNLP
A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包
ROAD2018/m3u8
Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions
ROAD2018/matchering
🎚️ Open Source Audio Matching and Mastering
ROAD2018/mini-omni
open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
ROAD2018/pytextclassifier
pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。
ROAD2018/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
ROAD2018/RapidFuzz
Rapid fuzzy string matching in Python using various string metrics
ROAD2018/RapidOCR
Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVION and PaddlePaddle. (将PaddleOCR模型做了转换,采用ONNXRuntime推理,速度很快)
ROAD2018/redis-py
Redis Python client
ROAD2018/RediSearch
A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.
ROAD2018/SenseVoice
Multilingual Voice Understanding Model
ROAD2018/similarities
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
ROAD2018/text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
ROAD2018/VideoChat
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3 seconds.
ROAD2018/webrtc-issue-detector
Diagnostic tool and troubleshooter for WebRTC applications with Mean Opinion Score (MOS) calculator