Pinned Repositories
bilibiliUpload
upload video to bilibili
chinese-hubert-soft
DupImageDetection
海量图片去重算法-局部分块Hash算法
Framer
Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
hifigan-yingram-vc
vc
inferStreamHiFiGAN
StreamHiFiGAN offers a HiFiGAN vocoder model optimized for streaming inference, providing real-time audio synthesis capabilities.
LinearityIQA
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment, Accepted by ACM MM 2020
natsume
A Japanese text frontend processing toolkit
RAFT-Softsplat-VFI
Video Frame Interpolation (RAFT + Softsplat)
splinter21's Repositories
splinter21/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
splinter21/amt-apc
AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
splinter21/Apollo
Music repair method to convert lossy MP3 compressed music to lossless music.
splinter21/Bert-vits2-NoBug
A simple rewriting project of Bert-vits2, which is a effective TTS framework. Killed all the possible bugs.
splinter21/ChineseTaiwaneseWhisper
This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (Mandarin) and Taiwanese Hokkien languages. It includes tools and scripts for data preprocessing, model training, and evaluation, tailored to improve speech recognition accuracy for these languages.
splinter21/DIE-engine
DIE engine
splinter21/F5-TTS
Official code for "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
splinter21/genmoai-models
The best OSS video generation models
splinter21/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
splinter21/LLaMA-Omni
Low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct.
splinter21/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
splinter21/minimind
【大模型】3小时完全从0训练一个仅有26M的小参数GPT,最低仅需2G显卡即可推理训练!
splinter21/moshi
splinter21/MSST-WebUI
Music Source Separation Training Inference Webui, besides, we packed UVR together!
splinter21/Neural-Vocoders-as-Speech-Enhancers
splinter21/OmniSenseVoice
SenseVoice Recognition and Forced-Alingment
splinter21/PDMX
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
splinter21/pedalboard
🎛 🔊 A Python library for audio.
splinter21/PM-EVC
This is the official implement of A Controllable Emotion Voice Conversion Framework with Pre-trained Speech Representations
splinter21/qa-mdt
splinter21/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
splinter21/scoreq
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
splinter21/SourceFilterNeuralFormants
splinter21/storm
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
splinter21/StreamFlow
splinter21/StyleTTS-ZS_Scripts_AT
StyleTTS-ZS: Acoustic_Synth_training
splinter21/vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
splinter21/vs_temporalfix
Vapoursynth function to add Temporal Coherence to AI Upscales
splinter21/WaveFM
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
splinter21/Westlake-Omni