splinter21

splinter21's Stars

facebookresearch/dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
Language:Jupyter Notebook8.1k 93 356683
ToonCrafter/ToonCrafter
a research paper for generative cartoon interpolation
Language:Python4k 41 28333
TMElyralab/MusePose
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
Language:Python1.5k 34 3893
rsxdalv/tts-generation-webui
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
Language:TypeScript1.4k 28 179155
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Language:Python493 4 230
Tele-AI/TeleSpeech-ASR
Language:Python34330
ZHO-ZHO-ZHO/ComfyUI-APISR
Unofficial implementation of APISR for ComfyUI
Language:Python310 5 919
MC-E/ReVideo
2195
longyuewangdcu/GuoFeng-Webnovel
Multilingual Corpus of Web Fiction
1965
wazenmai/MIDI-BERT
This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.
Language:Python172 4 1121
mira-space/MiraData
Language:Python163 11 33
craffel/midi-dataset
Code for creating a dataset of MIDI ground truth
Language:Jupyter Notebook157 10 1526
jishengpeng/ControlSpeech
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Language:Python1063
upskyy/Transformer-Transducer
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
Language:Python97 3 419
jishengpeng/TextrolSpeech
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)
Language:Python94 5 04
bytedance/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
Language:Python92 4 010
Wataru-Nakata/miipher
Unofficial implementation of miipher
Language:Python89 5 714
DDMAL/salami-data-public
Language:Ruby83 15 1518
zhang-tao-whu/DVIS_Plus
Language:Python74 2 144
ml-research/rational_activations
Rational Activation Functions - Replacing Padé Activation Units
Language:Cuda59 5 1013
cloneofsimo/sdxl_inversions
Language:Jupyter Notebook57 2 13
v3ucn/live2d-TTS-LLM-GPT-SoVITS-Vtuber
低成本的简单基于live2d TTS文字转语音和大模型聊天的直播解决方案
Language:HTML558
alibaba/diffusers-api
Language:Python33 2 04
PantoMatrix/BEAT
A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis [ECCV 2022]
Language:Python29 1 12
mt-upc/ZeroSwot
Pushing the Limits of Zero-shot End-to-End Speech Translation
Language:Python18 12 03
litagin02/Style-Bert-VITS2-Editor
Language:TypeScript16 4 81
YasserdahouML/visper
ViSpeR: Multilingual Audio-Visual Speech Recognition
Language:Python162
camenduru/DiffSketcher-colab
Language:Jupyter Notebook13 3 12
andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
Language:Python11 2 100
mbrotos/SoundSeg
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
Language:Jupyter Notebook90