LittleFlyingSheep's Stars
Phuriches/GenRepASD
Pytorch implementation of Deep Generic Representations for Domain-Generalized Anomalous Sound Detection: https://arxiv.org/abs/2409.05035
Kota-Dohi/dcase2022_evaluator
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
ZhuShaoQiang/PapersInTime
可以按照时间顺序,引用关系记录论文。
Audio-AGI/dcase2024_task9_baseline
Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"
adapter-hub/adapters
A Unified Library for Parameter-Efficient and Modular Transfer Learning
muzairkhattak/multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
haoheliu/SemantiCodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
SarthakYadav/audio-mamba-official
Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"
haidog-yaqub/DPMTSE
A Diffusion Probabilistic Model for Target Sound Extraction
frankenliu/LOAE
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
jaeyeonkim99/EnCLAP
Official Implementation of EnCLAP (ICASSP 2024)
Labbeti/dcase2024-task6-baseline
DCASE2024 Challenge Task 6 baseline system (Automated Audio Captioning)
Labbeti/conette-audio-captioning
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
boschresearch/acoustic-traffic-simulation-counting
Baseline code for DCASE 2024 Task10 and ICASSP 2024 paper
state-spaces/mamba
Mamba SSM architecture
lisiyao21/Bailando
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
OptimusPrimus/dcase2023_task6b
CP-JKU's Task6b Submission to DCASE2023
karolpiczak/ESC-50
ESC-50: Dataset for Environmental Sound Classification
meta-llama/llama
Inference code for Llama models
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
microsoft/CLAP
Learning audio concepts from natural language supervision
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
LittleFlyingSheep/P-LocalAFT
This project is corresponding to the paper "Local Information Assisted Attention-free Decoder for Audio Captioning" published in IEEE Signal Processing Letters.
thuhcsi/LightGrad
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
XinhaoMei/WavCaps
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.