YoshikiMas's Stars
kyutai-labs/moshi
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
guanyingc/latex_paper_writing_tips
Tips for Writing a Research Paper using LaTeX
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
geoffreybennett/alsa-scarlett-gui
alsa-scarlett-gui is a Gtk4 GUI for the ALSA controls presented by the Linux kernel Focusrite Scarlett2 Mixer Driver
SpeechifyInc/Meta-voicebox
Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.
marlin-codes/Awesome-Hyperbolic-Representation-and-Deep-Learning
Paper list about hyperbolic embedding, hyperbolic models,hyperbolic applications
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
line/lighthouse
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
AudioLLMs/AudioBench
AudioBench: A Universal Benchmark for Audio Large Language Models
aikiriao/SRLA
Svr-fiR Lossless Audio codec
yukara-ikemiya/wavefit-pytorch
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
Alexander-H-Liu/dinosr
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
unilight/sheet
Speech Human Evaluation Estimation Toolkit (SHEET)
cai525/Transformer4SED
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
HaoFengyuan/X-TF-GridNet
The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is accepted by Information Fusion.
AlanBaade/SyllableLM
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
kyegomez/qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
mubtasimahasan/DM-Codec
Source code for DM-Codec.
xefonon/RIRPINN
Room Impulse Response reconstruction with Physics Informed Neural Networks
orchidas/StereoWidener
Plugin to do stereo widening with decorrelation
liangsusan-git/AV-NeRF
[NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
polimi-ispl/nah-khcnn
Repository of "A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography"
sh01k/imp_tsp
Measuring impulse response with time-stretched pulse (TSP) signal
SebastianJiroSchlecht/OptimizedVelvetDecorrelators
Matlab Code for Schlecht, S., Alary, B., Välimäki, V., Habets, E. (2018). Optimized velvet-noise decorrelator Proc. Int. Conf. Digital Audio Effects (DAFx)
merlresearch/avlen
Code used in our NeurIPS 2022 paper 'AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments'
UDASE-CHiME2023/reverberant-LibriCHiME-5
Scripts to generate the reverberant LibriCHiME-5 dataset.
h-munakata/Lighthouse-Wrapper-for-Audio-Moment-Retrieval
sinhat98/nishika-competition
nishikaコンペの再現コード
tky823/Audyn