Pinned Repositories
4DFM
4D Facial Expression Diffusion Model
ACF_GPU
GPU version of ACF pedestrian detection
Acoustic-feedback-detection
Implementation of an algorithm to detect acoustic feedback from a audio file
AEC
AEC-Challenge
AEC Challenge
AEC3
AEC3 Extracted From WebRTC
books
技术书籍
DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,近30万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
opus
Modern audio compression for the internet.
Test
这是个测试项目
zhongshijun's Repositories
zhongshijun/convofusion
Co-Speech Gesture Synthesis
zhongshijun/DEEPTalk
Official code release of "DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation"
zhongshijun/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
zhongshijun/Fast-3D-Talking-Face
Drive your metahuman to speak within 1 second.
zhongshijun/firecrawl
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
zhongshijun/FLAME-Universe
Summary of publicly available ressources such as code, datasets, and scientific papers for the FLAME 3D head model
zhongshijun/free_avatar
zhongshijun/GaussianTalker
zhongshijun/jltr-alignment
Audio-to-score alignment with human-labeled repeats
zhongshijun/LAMM
zhongshijun/langchain
🦜🔗 Build context-aware reasoning applications
zhongshijun/LivePortrait
Bring portraits to life!
zhongshijun/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
zhongshijun/Mandarin-Chinese-Syllable-Dataset
汉语普通话音节数据集 - 覆盖性广,使用频率高,涵盖所有普通话发音。Mandarin Chinese Syllable Dataset - Extensive coverage, high frequency of use, and includes all Mandarin pronunciations.
zhongshijun/MIDI-BERT
This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.
zhongshijun/midifile
C++ classes for reading/writing Standard MIDI Files
zhongshijun/Non-corresponding-and-Topology-free-3D-Face-Expression-Transfer
zhongshijun/NRAEC_vs_NRextAEC
zhongshijun/ProbTalk3D
zhongshijun/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
zhongshijun/pyhanlp
中文分词
zhongshijun/ScanTalk
[ECCV 2024] - ScanTalk: 3D Talking Heads from Unregistered Scans
zhongshijun/SEMamba
This is the official implementation of the SEMamba paper.
zhongshijun/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
zhongshijun/SPARK
Official implementation for the SIGGRAPH Asia 2024 paper SPARK: Self-supervised Personalized Real-time Monocular Face Capture
zhongshijun/Speech-Simulation-Tools
语音增强领域的相关数据仿真工具和方法汇总--持续更新
zhongshijun/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
zhongshijun/SyncTalk
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
zhongshijun/ToneLab
A platform designed for lightweight documentation and quantitative analysis in Sino-Tibetan tonal languages
zhongshijun/universal-speech-enhancement
Apply Score diffusion to improve speech signals recorded under various adverse conditions and distortions, including noise, reverberation, clipping, equalization (EQ) distortion, packet loss, codec loss, bandwidth limitations, and other forms of degradation.