steventan0110
PhD Student @ JHU, Research Scientist Intern @ Meta AI | Prev @ Meta AI, Amazon Alexa AI
Meta, JHUBaltimore, Maryland
steventan0110's Stars
junyanz/pytorch-CycleGAN-and-pix2pix
Image-to-Image Translation in PyTorch
state-spaces/mamba
Mamba SSM architecture
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
phillipi/pix2pix
Image-to-image translation with conditional adversarial nets
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
microsoft/VQ-Diffusion
Official implementation of VQ-Diffusion
acl-org/acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
google/visqol
Perceptual Quality Estimator for speech and audio
zhangshaolei1998/Awesome-Simultaneous-Translation
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
microsoft/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
louaaron/Score-Entropy-Discrete-Diffusion
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
yangdongchao/Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
google-research-datasets/cvss
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
Rongjiehuang/TranSpeech
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
albertfgu/diffwave-sashimi
Implementation of DiffWave and SaShiMi audio generation models
facebookresearch/SimulEval
SimulEval: A General Evaluation Toolkit for Simultaneous Translation
MingLunHan/CIF-PyTorch
[ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-Fire mechanism).
OpenNLPLab/HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling
tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
ictnlp/DASpeech
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
ictnlp/DiSeg
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
George0828Zhang/torch_cif
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
dqqcasia/mosst
George0828Zhang/simulst
PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.