HaiFengZeng

HaiFengZeng's Stars

lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Language:Python59650
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Language:Python22819
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language:Python51243
StarostinV/convkan
Convolutional layer for Kolmogorov-Arnold Network (KAN)
Language:Python768
SUDO-AI-3D/zero123plus
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Language:Python1.7k118
akaashdash/kansformers
Language:Jupyter Notebook11412
massgravel/Microsoft-Activation-Scripts
Open-source Windows and Office activator featuring HWID, Ohook, KMS38, and Online KMS activation methods, along with advanced troubleshooting.
Language:Batchfile97.2k9.5k
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python6.1k657
litagin02/laughter-collector
大量の音声データから笑い声部分を集めるやつ
Language:Python71
hayeong0/DDDM-VC
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
Language:Python17519
dcharatan/flowmap
Code for "FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent" by Cameron Smith*, David Charatan*, Ayush Tewari, and Vincent Sitzmann
Language:Python87284
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Language:HTML1324
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python4k302
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python4.3k428
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda23.6k2.6k
TMElyralab/MuseV
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Language:Python2.4k254
annosubmission/GRC-Cache
Language:Python132
magic-research/piecewise-rectified-flow
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
Language:Jupyter Notebook42026
RWKV/RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
Language:Jupyter Notebook13228
kyegomez/Vit-RGTS
Open source implementation of "Vision Transformers Need Registers"
Language:Python12913
VINHYU/CoSeR
[CVPR 2024] CoSeR: Bridging Image and Language for Cognitive Super-Resolution
3209
LC044/WeChatMsg
提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手
Language:Python33.5k3.5k
magic-research/magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Language:Python10.4k1.1k
lucidrains/meshgpt-pytorch
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch
Language:Python71859
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python4.8k391
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.4k105
p0p4k/Matcha-TTS-2
E2E TTS using Conditional Flow Matching (Experimental*)
Language:Jupyter Notebook655
cgisky1980/AI00_Assistant
550W_AI_Assistant(550W智能助手) Everyone should have their own AI.
22
harlanhong/ICCV2023-MCNET
The official code of our ICCV2023 work: Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
Language:Python24521
elevenlabs/elevenlabs-python
The official Python API for ElevenLabs Text to Speech.
Language:Python2.1k240