JinhuaLiang
A Ph.D. student from Centre for Digial Music (C4DM), Queen Mary University of London.
London
JinhuaLiang's Stars
Labbeti/conette-audio-captioning
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Aria-K-Alethia/BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
kyutai-labs/moshi
anusfoil/LLaQo
LLaQo, a Large Language Query-based Coach in the domain of expressive performance
npurson/fid-metrics
A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
haoheliu/fid-metrics
A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
hadeel253/Music-Generation-with-WaveGAN
WaveGAN on GTZAN Music genre classification dataset
JinhuaLiang/bigvgan
Official PyTorch implementation of BigVGAN (ICLR 2023)
FoundationVision/OmniTokenizer
OmniTokenizer: one model and one weight for image-video joint tokenization.
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
phizaz/diffae
Official implementation of Diffusion Autoencoders
ashleve/lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
ccfddl/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
declare-lab/tango
A family of diffusion models for text-to-audio generation.
habla-liaa/encodecmae
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
ali-vilab/MimicBrush
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
IDEA-Research/DINO
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
gle-bellier/flow-matching
Annotated Flow Matching paper
AILab-CVC/CV-VAE
[NeurIPS 24] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
anusfoil/PianoJudges
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano
facebookresearch/audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
bytedance/1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
python-poetry/poetry
Python packaging and dependency management made easy
r9y9/tacotron_pytorch
PyTorch implementation of Tacotron speech synthesis model.
lucidrains/rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch