HaiFengZeng's Stars
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
StarostinV/convkan
Convolutional layer for Kolmogorov-Arnold Network (KAN)
SUDO-AI-3D/zero123plus
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
akaashdash/kansformers
massgravel/Microsoft-Activation-Scripts
Open-source Windows and Office activator featuring HWID, Ohook, KMS38, and Online KMS activation methods, along with advanced troubleshooting.
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
litagin02/laughter-collector
大量の音声データから笑い声部分を集めるやつ
hayeong0/DDDM-VC
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
dcharatan/flowmap
Code for "FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent" by Cameron Smith*, David Charatan*, Ayush Tewari, and Vincent Sitzmann
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
huggingface/parler-tts
Inference and training library for high-quality TTS models.
karpathy/llm.c
LLM training in simple, raw C/CUDA
TMElyralab/MuseV
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
annosubmission/GRC-Cache
magic-research/piecewise-rectified-flow
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
RWKV/RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
VINHYU/CoSeR
[CVPR 2024] CoSeR: Bridging Image and Language for Cognitive Super-Resolution
LC044/WeChatMsg
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
lucidrains/meshgpt-pytorch
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
p0p4k/Matcha-TTS-2
E2E TTS using Conditional Flow Matching (Experimental*)
cgisky1980/AI00_Assistant
550W_AI_Assistant(550W智能助手) Everyone should have their own AI.
harlanhong/ICCV2023-MCNET
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
elevenlabs/elevenlabs-python
The official Python API for ElevenLabs Text to Speech.