Pinned Repositories
A-Survey-on-Generative-Diffusion-Model
AcademiCodec-audio-codec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
acoular
Library for acoustic beamforming
AISHELL-4
Alibaba-MIT-Speech
Alibaba speech technology
alibaba_damo_FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.
AliceMind
amt-apc
音乐: 自动钢琴翻唱: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
asap-dataset-music
A dataset of 222 digital musical scores aligned with 1068 performances (more than 92 hours) of Western classical piano music.
kingfener's Repositories
kingfener/amt-apc
音乐: 自动钢琴翻唱: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model
kingfener/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
kingfener/audioseal
音频水印检测--- deep fake detect : Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
kingfener/build-nanogpt
Video+code lecture on building nanoGPT from scratch
kingfener/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
kingfener/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
kingfener/GigaSpeech2
Data: An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
kingfener/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型, TTS 效果不错
kingfener/godot
Godot Engine – Multi-platform 2D and 3D game engine
kingfener/GOT-OCR2.0
2024-好用的 ocr 工具: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
kingfener/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
kingfener/Mamba-YOLO
the official pytorch implementation of “Mamba-YOLO:SSMs-based for Object Detection”
kingfener/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
kingfener/midi-fluidsynth
midi 播放: Software synthesizer based on the SoundFont 2 specifications
kingfener/mini-omni
全端到端-语音对话-大模型-Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. Technical report: https://arxiv.org/abs/2408.16725
kingfener/ml-depth-pro
苹果-深度图-估计-Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
kingfener/moshi_chat_LLM
流式、实时对话 LLM
kingfener/MS-Diffusion
图像生成-可控主题-位置: Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
kingfener/penn
音高估计: Pitch Estimating Neural Networks (PENN)
kingfener/polyphone-CVTE-Poly
CVTE-Poly : 多音字数据集---Chinese polyphone disambiguation for Text-to-Speech application
kingfener/pykan
Kolmogorov Arnold Networks : KAN 2.0 with multi
kingfener/qa-mdt
文本到音乐生成: 241010-SOTA Text-to-music (TTM) Generation (OpenMusic)
kingfener/RoMoAligner
MFA-Alignment model for text-to-speech
kingfener/seed-tts-eval
tts 合成效果 主观测评工具
kingfener/seed-vc
语音转换: zero-shot voice conversion with in context learning
kingfener/SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
kingfener/TimeMixer
时间序列预测 [ICLR 2024] Official implementation of "TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting"
kingfener/timesfm
时间序列预测模型:TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
kingfener/Video-MME
多模态-视频测试集-for-LLM✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
kingfener/wesep
Target Speaker Extraction Toolkit