YoshikiMas

YoshikiMas's Stars

kyutai-labs/moshi
Language:Python7k 80 91550
facebookresearch/lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Language:Python4.3k 27 40224
guanyingc/latex_paper_writing_tips
Tips for Writing a Research Paper using LaTeX
Language:TeX3.3k 28 0376
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
Language:Python1k 17 026
geoffreybennett/alsa-scarlett-gui
alsa-scarlett-gui is a Gtk4 GUI for the ALSA controls presented by the Linux kernel Focusrite Scarlett2 Mixer Driver
Language:C679 23 11038
SpeechifyInc/Meta-voicebox
Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.
571 85 431
marlin-codes/Awesome-Hyperbolic-Representation-and-Deep-Learning
Paper list about hyperbolic embedding, hyperbolic models,hyperbolic applications
393 9 232
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
Language:Python153 5 511
line/lighthouse
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Language:Python111 7 158
AudioLLMs/AudioBench
AudioBench: A Universal Benchmark for Audio Large Language Models
Language:Python106 8 11
aikiriao/SRLA
Svr-fiR Lossless Audio codec
Language:C52 6 13
yukara-ikemiya/wavefit-pytorch
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
Language:Python49 2 13
Alexander-H-Liu/dinosr
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Language:Python47 4 34
unilight/sheet
Speech Human Evaluation Estimation Toolkit (SHEET)
Language:Python45 4 16
cai525/Transformer4SED
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
Language:Python39 3 53
HaoFengyuan/X-TF-GridNet
The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is accepted by Information Fusion.
Language:Python395
AlanBaade/SyllableLM
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Language:Python38 4 01
kyegomez/qformer
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Language:Python31 3 1
mubtasimahasan/DM-Codec
Source code for DM-Codec.
Language:Python31 3 12
xefonon/RIRPINN
Room Impulse Response reconstruction with Physics Informed Neural Networks
Language:Jupyter Notebook27 1 01
orchidas/StereoWidener
Plugin to do stereo widening with decorrelation
Language:Python26 1 80
liangsusan-git/AV-NeRF
[NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Language:Python23 4 40
polimi-ispl/nah-khcnn
Repository of "A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography"
Language:Python16 7 23
sh01k/imp_tsp
Measuring impulse response with time-stretched pulse (TSP) signal
Language:Python14 3 04
SebastianJiroSchlecht/OptimizedVelvetDecorrelators
Matlab Code for Schlecht, S., Alary, B., Välimäki, V., Habets, E. (2018). Optimized velvet-noise decorrelator Proc. Int. Conf. Digital Audio Effects (DAFx)
Language:MATLAB13 4 22
merlresearch/avlen
Code used in our NeurIPS 2022 paper 'AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments'
Language:Python7 1 01
UDASE-CHiME2023/reverberant-LibriCHiME-5
Scripts to generate the reverberant LibriCHiME-5 dataset.
Language:Python5 1 00
h-munakata/Lighthouse-Wrapper-for-Audio-Moment-Retrieval
Language:Python3 2 00
sinhat98/nishika-competition
nishikaコンペの再現コード
Language:Python3 1 00
tky823/Audyn
Language:Python2 1 570