Maoshuiyang

Multimodal & generative AI & speech & affective computing

The Chinese University of Hong KongHong Kong

Pinned Repositories

AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python0 0 00
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python00
AnimateDiff
Official implementation of AnimateDiff.
Language:Python0 0 00
annotated_deep_learning_paper_implementations
🧑‍🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Language:Jupyter Notebook0 0 00
audio-caption-eval.github.io
Audio captioner evaluation for video-to-audio task
Language:HTML00
audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Language:Python00
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python0 0 00
AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Language:Python0 0 00
AudioLDM2
Text-to-Audio/Music Generation
Language:Python0 0 00
kaldi_emo
Hidden Markov model (HMM)-based speech emotion recognition (SER) using Kaldi.
Language:Shell30

Maoshuiyang's Repositories

Maoshuiyang/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python00
Maoshuiyang/audio-caption-eval.github.io
Audio captioner evaluation for video-to-audio task
Language:HTML00
Maoshuiyang/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Language:Python00
Maoshuiyang/Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Maoshuiyang/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Maoshuiyang/FCPE
Maoshuiyang/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Maoshuiyang/flux
Official inference repo for FLUX.1 models
Maoshuiyang/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Maoshuiyang/jukebox
Code for the paper "Jukebox: A Generative Model for Music"
Maoshuiyang/Llama-Chinese
Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用
Maoshuiyang/LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Maoshuiyang/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Maoshuiyang/LLaVA-UHD
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Maoshuiyang/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Language:Python0 0
Maoshuiyang/Maoshuiyang.github.io
My Blog / Jekyll Themes / PWA
Language:HTML
Maoshuiyang/md.github.io
个人网站 [Material for MkDocs]
Maoshuiyang/mini-omni
open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Maoshuiyang/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Maoshuiyang/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Maoshuiyang/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Maoshuiyang/sacrebleu
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Maoshuiyang/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Maoshuiyang/seed-tts-eval
Maoshuiyang/stable-audio-metrics
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Maoshuiyang/unsloth
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Maoshuiyang/visqol
Perceptual Quality Estimator for speech and audio
Maoshuiyang/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Maoshuiyang/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Maoshuiyang/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)