ROAD2018

Pinned Repositories

A-real-time-time-domain-speech-enhancement-model
Language:C3 0 00
AEC-ANS-AGC
AEC/ANS/AGC from webrtc
Language:C14
asteroid
The PyTorch-based audio source separation toolkit for researchers || Current highlight : we got our WHAMR results check it out here !
Language:Python10
AudioBSS
Blind source seperation of audio records
Language:MATLAB20
awesome-vad
A curated list of awesome voice activity detection
40
essentia
C++ library for audio and music analysis, description and synthesis, including Python bindings
Language:Jupyter Notebook10
Multi-Channel-Acoustic-Echo-Cancellation
10
MVDR-Speech-Enhancement
Language:C++1 1 00
Realtime_AudioDenoise_EchoCancellation
Language:C++3 1 00
Speech-Separation-Paper-Tutorial
A must-read paper for speech separation based on neural networks
2 1 00

ROAD2018's Repositories

ROAD2018/aiokafka
asyncio client for kafka
ROAD2018/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
ROAD2018/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
ROAD2018/comfyui_LLM_party
Dify in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai/gemini interfaces, such as o1,ollama, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage neo4j KG, graphRAG / RAG / html 2 img
ROAD2018/confluent-kafka-go
Confluent's Apache Kafka Golang client
ROAD2018/confluent-kafka-python
Confluent's Kafka Python Client
ROAD2018/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
ROAD2018/deep-learning-for-image-processing
deep learning for image processing including classification and object-detection etc.
ROAD2018/elasticsearch-py
Official Python client for Elasticsearch
Language:Python0 0
ROAD2018/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
ROAD2018/FunASR
A Fundamental End-to-End Speech Recognition Toolkit
Language:Python
ROAD2018/FunAudioLLM-APP
ROAD2018/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
ROAD2018/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python
ROAD2018/GPT-SoVITS-V2
GPT-SoVITS-V2模型，合并了官方的一些PR，包含但不限于:参考音频自动填充，字幕同步，SillyTavern酒馆接入等功能
ROAD2018/HanziNLP
A NLP package for Chinese text：Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包
ROAD2018/m3u8
Python m3u8 Parser for HTTP Live Streaming (HLS) Transmissions
ROAD2018/matchering
🎚️ Open Source Audio Matching and Mastering
ROAD2018/mini-omni
open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
ROAD2018/pytextclassifier
pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，BERT等分类模型实现，开箱即用。
ROAD2018/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
ROAD2018/RapidFuzz
Rapid fuzzy string matching in Python using various string metrics
ROAD2018/RapidOCR
Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVION and PaddlePaddle. （将PaddleOCR模型做了转换，采用ONNXRuntime推理，速度很快）
ROAD2018/redis-py
Redis Python client
ROAD2018/RediSearch
A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.
ROAD2018/SenseVoice
Multilingual Voice Understanding Model
ROAD2018/similarities
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包，支持亿级数据文搜文、文搜图、图搜图，python3开发，开箱即用。
ROAD2018/text2vec
text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。
ROAD2018/VideoChat
实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，支持音色克隆，首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3 seconds.
ROAD2018/webrtc-issue-detector
Diagnostic tool and troubleshooter for WebRTC applications with Mean Opinion Score (MOS) calculator