xrkong's Stars
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
youngyangyang04/leetcode-master
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
changgyhub/leetcode_101
LeetCode 101:力扣刷题指南
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
chvmp/champ
MIT Cheetah I Implementation
openai/openai-realtime-embedded-sdk
Instructions on how to use the Realtime API on Microcontrollers and Embedded Platforms
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
nasa-jpl/rosa
ROSA 🤖 is an AI Agent designed to interact with ROS1- and ROS2-based robotics systems using natural language queries. ROSA helps robot developers inspect, diagnose, understand, and operate robots.
nutonomy/second.pytorch
PointPillars for KITTI object detection
lhl/voicechat2
Local SRT/LLM/TTS Voicechat
KoljaB/LocalAIVoiceChat
Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.
zhulf0804/PointPillars
A Simple PointPillars PyTorch Implementation for 3D LiDAR(KITTI) Detection.
NVIDIA-AI-IOT/CUDA-PointPillars
A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
vndee/local-talking-llm
A talking LLM that runs on your own computer without needing the internet.
unitreerobotics/unitree_sdk2
Unitree robot sdk version 2. https://support.unitree.com/home/zh/developer
unitreerobotics/unitree_ros2
cguweb-com/Arduino-Projects
thuhcsi/SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Geekgineer/ros2_bag_exporter
ROS2 Bag Exporter is a versatile ROS 2 c++ package designed to export ROS 2 bag files (rosbag2) into various formats, including images, point cloud data (PCD) files, IMU data, and GPS data. This tool facilitates the extraction and conversion of data from bag files for analysis, visualization, and processing outside the ROS ecosystem.
HLTCHKUST/CI-AVSR
Code repository for the Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR) dataset.
d-gurgurov/im2latex
A repo for the Formula Recognition Model (im2latex) based on Vision Encoder Decoder Model
SMIL-SPCRAS/DAVIS
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
xrkong/skimba
Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion
mbnmoeini/digital-voice-assistant-in-car
This project focuses on designing a digital voice assistant for vehicle command recognition. This system leverages three key techniques: speech-to-text conversion using Vosk, a lightweight LLM model, text classification with an SVM, and out-of-distribution (OOD) sentence detection with calibrated probabilities through Platt scaling.