Pinned Repositories
book-text-to-speech
A book about Text-to-Speech (TTS) in Chinese.
cc-compare
一款可替换beycond compare, 免费使用的代码同步对比工具,来自中国。
downkyi
哔哩下载姬downkyi,B站视频下载工具,支持批量下载,支持8K、HDR、杜比视界,提供工具箱(音视频提取、去水印等)。
Grad-TTS-Chinese
Huawei Grad-TTS for Chinese
LLMBook-zh.github.io
《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣
MiniThunder
android迷你版迅雷,支持thunder:// ftp:// http:// ed2k:// 磁力链 种子文件的下载,音视频文件支持边下边播.
so-vits-svc
SoftVC VITS Singing Voice Conversion
TikTokDownloader
完全免费开源,基于 Requests 模块实现:TikTok 主页/视频/图集/原声;抖音主页/视频/图集/收藏/直播/原声/合集/评论/账号/搜索/热榜数据采集工具
tuning_playbook
《深度学习调优指南》A playbook for systematically maximizing the performance of deep learning models.
VI-Speaker
Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.
MaxMax2016's Repositories
MaxMax2016/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
MaxMax2016/infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
MaxMax2016/PitchVC
PitchVC: Pitch Conditioned Any-to-Many Voice Conversion
MaxMax2016/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
MaxMax2016/auris
AI based singing voice synthesis
MaxMax2016/Automatic_Speech_Annotator
自动语音标注:Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition
MaxMax2016/bigvsan
Sony改进的Bigvgan,Pytorch implementation of BigVSAN
MaxMax2016/ChatTTS
ChatTTS is a generative speech model for daily dialogue.
MaxMax2016/ConvNeXt-TTS
Unofficial implementation of ConvNeXt-TTS powered by lightning and Rye
MaxMax2016/fregrad
Code repository for FreGrad
MaxMax2016/GeneFacePlusPlus
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
MaxMax2016/General-World-Models-Survey
世界模型综述
MaxMax2016/hilcodec
MaxMax2016/human-motion-capture
collect papers about human motion capture
MaxMax2016/HunyuanDiT
腾讯混元文生图,Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
MaxMax2016/languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
MaxMax2016/lina-speech
lina-speech : linear attention based text-to-speech
MaxMax2016/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
MaxMax2016/Make-An-Audio-2
字节DiT用于语音,a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
MaxMax2016/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
MaxMax2016/mediapipe
厉害 Cross-platform, customizable ML solutions for live and streaming media.
MaxMax2016/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
MaxMax2016/stable-speech
Reproduction of Stability AI's Text-to-Speech model.
MaxMax2016/StoryDiffusion
Create Magic Story!
MaxMax2016/stream-vc
An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)
MaxMax2016/Vach
Real time streaming talking head
MaxMax2016/WaveNeXT_pytorch
Unofficial implementation of wavenext vocoder
MaxMax2016/WhisperSpeech
三段模型 An unofficial PyTorch implementation of SPEAR-TTS.
MaxMax2016/yolov10
YOLOv10: Real-Time End-to-End Object Detection
MaxMax2016/zh_recogn
将音频或视频中的中文语音识别并导出为srt字幕,基于魔塔社区Paraformer模型