Pinned Repositories
ai-audio-datasets-list
This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications. It is mainly used for speech recognition, speech synthesis, singing voice synthesis, music information retrieval, music generation, etc.
bigcode-dataset
llm-paper-daily
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
MidiTok
MIDI / symbolic music tokenizers for Deep Learning models 🎶
nmt_data_tools
machine translation data process tools
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错,Kenlm,Seq2Seq_Attention,BERT,MacBERT,ELECTRA,ERNIE,Transformer等模型实现,开箱即用。
speech_process
语音处理基本教程
TikTokDownloader
完全免费开源,基于 Requests 模块实现:TikTok 主页/视频/图集/原声;抖音主页/视频/图集/收藏/直播/原声/合集/评论/账号/搜索/热榜数据采集工具
voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Wav2Lip-GFPGAN
WhiteFu's Repositories
WhiteFu/ai-audio-startups
Community list of startups working with AI in audio and music technology
WhiteFu/audio-pipeline
WhiteFu/AudioEditingCode
WhiteFu/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
WhiteFu/Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
WhiteFu/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
WhiteFu/Bunny
A family of lightweight multimodal models.
WhiteFu/codec-bpe
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
WhiteFu/ConsistI2V
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
WhiteFu/diarizers
WhiteFu/EVA
EVA Series: Visual Representation Fantasies from BAAI
WhiteFu/FRESCO
[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
WhiteFu/i-Code
WhiteFu/lina-speech
lina-speech : linear attention based text-to-speech
WhiteFu/llava-phi
WhiteFu/llm-datasets
High-quality datasets, tools, and concepts for LLM fine-tuning.
WhiteFu/M2UGen
This is the official repository for M2UGen
WhiteFu/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
WhiteFu/metavoice-src
Foundational model for human-like, expressive TTS
WhiteFu/MoneyPrinterTurbo
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
WhiteFu/Open-Sora
Building your own video generation model like OpenAI's Sora
WhiteFu/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
WhiteFu/overseas-website-note
「海外工具网站」已经是我人生主要事业了,很庆幸还来得及,感谢这个伟大的 AI 时代。
WhiteFu/pyannote-whisper
WhiteFu/pytorch-speech-features
WhiteFu/pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
WhiteFu/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
WhiteFu/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
WhiteFu/tts-qa
WhiteFu/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild