splinter21's Stars
facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
ToonCrafter/ToonCrafter
a research paper for generative cartoon interpolation
TMElyralab/MusePose
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
rsxdalv/tts-generation-webui
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Tele-AI/TeleSpeech-ASR
ZHO-ZHO-ZHO/ComfyUI-APISR
Unofficial implementation of APISR for ComfyUI
MC-E/ReVideo
longyuewangdcu/GuoFeng-Webnovel
Multilingual Corpus of Web Fiction
wazenmai/MIDI-BERT
This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.
mira-space/MiraData
craffel/midi-dataset
Code for creating a dataset of MIDI ground truth
jishengpeng/ControlSpeech
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
upskyy/Transformer-Transducer
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
jishengpeng/TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
bytedance/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
Wataru-Nakata/miipher
Unofficial implementation of miipher
DDMAL/salami-data-public
zhang-tao-whu/DVIS_Plus
ml-research/rational_activations
Rational Activation Functions - Replacing Padé Activation Units
cloneofsimo/sdxl_inversions
v3ucn/live2d-TTS-LLM-GPT-SoVITS-Vtuber
低成本的简单基于live2d TTS文字转语音和大模型聊天的直播解决方案
alibaba/diffusers-api
PantoMatrix/BEAT
A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis [ECCV 2022]
mt-upc/ZeroSwot
Pushing the Limits of Zero-shot End-to-End Speech Translation
litagin02/Style-Bert-VITS2-Editor
YasserdahouML/visper
ViSpeR: Multilingual Audio-Visual Speech Recognition
camenduru/DiffSketcher-colab
andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
mbrotos/SoundSeg
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation