xrkong

@uwa-rev

University of Western AustraliaPerth, WA6009, Australia

xrkong's Stars

donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Language:Python294k 6.7k 32348.9k
youngyangyang04/leetcode-master
《代码随想录》LeetCode 刷题攻略：200道经典题目刷题顺序，共60w字的详细图解，视频难点剖析，50余张思维导图，支持C++，Java，Python，Go，JavaScript等多语言版本，从此算法学习不再迷茫！🔥🔥 来看看，你会发现相见恨晚！🚀
Language:Shell55k 383 25511.9k
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python38.8k 307 1.2k4.9k
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Language:Python15.8k 266 2172.7k
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python10.6k 100 5601.5k
changgyhub/leetcode_101
LeetCode 101：力扣刷题指南
9.3k 145 911.2k
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python9.1k 75 1.4k926
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python8.8k 81 255686
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python5.6k 78 222519
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python2.1k 32 129128
chvmp/champ
MIT Cheetah I Implementation
Language:C++1.8k 65 124382
openai/openai-realtime-embedded-sdk
Instructions on how to use the Realtime API on Microcontrollers and Embedded Platforms
Language:C++1.5k 120 20173
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Language:Python1.2k 47 151420
nasa-jpl/rosa
ROSA 🤖 is an AI Agent designed to interact with ROS1- and ROS2-based robotics systems using natural language queries. ROSA helps robot developers inspect, diagnose, understand, and operate robots.
Language:Python1k 23 1391
nutonomy/second.pytorch
PointPillars for KITTI object detection
Language:Python997 59 0240
lhl/voicechat2
Local SRT/LLM/TTS Voicechat
Language:Python645 8 1968
KoljaB/LocalAIVoiceChat
Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.
Language:Python593 13 1567
zhulf0804/PointPillars
A Simple PointPillars PyTorch Implementation for 3D LiDAR(KITTI) Detection.
Language:Python591 4 91135
NVIDIA-AI-IOT/CUDA-PointPillars
A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
Language:Python572 8 97160
vndee/local-talking-llm
A talking LLM that runs on your own computer without needing the internet.
Language:Python420 11 1085
unitreerobotics/unitree_sdk2
Unitree robot sdk version 2. https://support.unitree.com/home/zh/developer
Language:C++333 18 0107
unitreerobotics/unitree_ros2
Language:C++209 7 060
cguweb-com/Arduino-Projects
Language:C++128 18 166
thuhcsi/SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Language:Python110 4 11
Geekgineer/ros2_bag_exporter
ROS2 Bag Exporter is a versatile ROS 2 c++ package designed to export ROS 2 bag files (rosbag2) into various formats, including images, point cloud data (PCD) files, IMU data, and GPS data. This tool facilitates the extraction and conversion of data from bag files for analysis, visualization, and processing outside the ROS ecosystem.
Language:C++71 2 311
HLTCHKUST/CI-AVSR
Code repository for the Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR) dataset.
Language:Python38 7 50
d-gurgurov/im2latex
A repo for the Formula Recognition Model (im2latex) based on Vision Encoder Decoder Model
Language:Python11 1 21
SMIL-SPCRAS/DAVIS
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
Language:JavaScript9 3 00
xrkong/skimba
Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion
7 2 00
mbnmoeini/digital-voice-assistant-in-car
This project focuses on designing a digital voice assistant for vehicle command recognition. This system leverages three key techniques: speech-to-text conversion using Vosk, a lightweight LLM model, text classification with an SVM, and out-of-distribution (OOD) sentence detection with calibrated probabilities through Platt scaling.
Language:Jupyter Notebook6 1 01

xrkong

xrkong's Stars

donnemartin/system-design-primer

youngyangyang04/leetcode-master

coqui-ai/TTS

openai/evals

SWivid/F5-TTS

changgyhub/leetcode_101

modelscope/FunASR

open-mmlab/Amphion

yl4579/StyleTTS2

X-PLUG/mPLUG-DocOwl

chvmp/champ

openai/openai-realtime-embedded-sdk

microsoft/DNS-Challenge

nasa-jpl/rosa

nutonomy/second.pytorch

lhl/voicechat2

KoljaB/LocalAIVoiceChat

zhulf0804/PointPillars

NVIDIA-AI-IOT/CUDA-PointPillars

vndee/local-talking-llm

unitreerobotics/unitree_sdk2

unitreerobotics/unitree_ros2

cguweb-com/Arduino-Projects

thuhcsi/SpeechCraft

Geekgineer/ros2_bag_exporter

HLTCHKUST/CI-AVSR

d-gurgurov/im2latex

SMIL-SPCRAS/DAVIS

xrkong/skimba

mbnmoeini/digital-voice-assistant-in-car