Fengdalu's Stars
ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
jantic/DeOldify
A Deep Learning based project for colorizing and restoring old images (and video!)
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
ExistentialAudio/BlackHole
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
HumanAIGC/EMO
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
dreamgaussian/dreamgaussian
[ICLR 2024 Oral] Generative Gaussian Splatting for Efficient 3D Content Creation
whitead/paper-qa
LLM Chain for answering questions from documents with citations
mseitzer/pytorch-fid
Compute FID scores with PyTorch.
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
DanielSWolf/rhubarb-lip-sync
Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.
thu-ml/unidiffuser
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
suragnair/seqGAN
A simplified PyTorch implementation of "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." (Yu, Lantao, et al.)
dvlab-research/LLaMA-VID
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
xwjdsh/2048-ai
An simple AI for the 2048 game.
yangshun/2048-python
🐍 2048
lingjzhu/CharsiuG2P
Multilingual G2P in 100 languages
lingjzhu/charsiu
Charsiu: A neural phonetic aligner.
kazgu/zotero-chatgpt
ChatGPT plugin for Zotero
mpc001/auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
Vontigo/Vontigo
🛸 Vontigo is an open-source CMS built with SvelteKit, featuring 🤖 AI-powered (ChatGPT) content generation. With fast page loads and seamless routing, Vontigo offers a user-friendly interface with customizable themes and templates.
jasonppy/PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
ms-dot-k/Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
Janie1996/AV4SER
PyTorch implementation for Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
srv-sh/Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)