steventan0110

PhD Student @ JHU, Research Scientist Intern @ Meta AI | Prev @ Meta AI, Amazon Alexa AI

Meta, JHUBaltimore, Maryland

steventan0110's Stars

junyanz/pytorch-CycleGAN-and-pix2pix
Image-to-Image Translation in PyTorch
Language:Python23.2k 348 1.5k6.3k
state-spaces/mamba
Mamba SSM architecture
Language:Python13.4k 98 5531.1k
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Language:Python10.9k 170 6682.3k
phillipi/pix2pix
Image-to-image translation with conditional adversarial nets
Language:Lua10.2k 323 2091.7k
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.4k 44 81578
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
6.1k 180 16852
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python3.5k 57 71305
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.6k 24 28192
haoheliu/AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Language:Python2.5k 42 107224
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Language:Python1.3k 53 31101
microsoft/VQ-Diffusion
Official implementation of VQ-Diffusion
Language:Python903 10 4163
acl-org/acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
Language:TeX757 8 26184
google/visqol
Perceptual Quality Estimator for speech and audio
Language:C++707 27 72127
zhangshaolei1998/Awesome-Simultaneous-Translation
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
569 26 17
microsoft/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Language:HTML490 21 15148
louaaron/Score-Entropy-Discrete-Diffusion
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
Language:Python415 7 1239
yangdongchao/Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
Language:Python349 17 2733
google-research-datasets/cvss
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
183 13 214
Rongjiehuang/TranSpeech
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
Language:Python173 16 723
albertfgu/diffwave-sashimi
Implementation of DiffWave and SaShiMi audio generation models
Language:Python118 5 1114
facebookresearch/SimulEval
SimulEval: A General Evaluation Toolkit for Simultaneous Translation
Language:Python102 16 2636
MingLunHan/CIF-PyTorch
[ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-Fire mechanism).
Language:Python67 5 36
OpenNLPLab/HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Language:Python61 2 24
tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Language:Python61 2 17
ictnlp/DASpeech
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Language:Python60 4 65
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
Language:Python49 3 34
ictnlp/DiSeg
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
Language:Python33 3 22
George0828Zhang/torch_cif
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
Language:Python32 3 13
dqqcasia/mosst
Language:Python28 4 33
George0828Zhang/simulst
PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.
Language:Python22 3 13

steventan0110

steventan0110's Stars

junyanz/pytorch-CycleGAN-and-pix2pix

state-spaces/mamba

Rudrabha/Wav2Lip

phillipi/pix2pix

facebookresearch/DiT

pliang279/awesome-multimodal-ml

facebookresearch/encodec

johnma2006/mamba-minimal

haoheliu/AudioLDM

lucidrains/naturalspeech2-pytorch

microsoft/VQ-Diffusion

acl-org/acl-style-files

google/visqol

zhangshaolei1998/Awesome-Simultaneous-Translation

microsoft/MS-SNSD

louaaron/Score-Entropy-Discrete-Diffusion

yangdongchao/Text-to-sound-Synthesis

google-research-datasets/cvss

Rongjiehuang/TranSpeech

albertfgu/diffwave-sashimi

facebookresearch/SimulEval

MingLunHan/CIF-PyTorch

OpenNLPLab/HGRN

tanyuqian/redco

ictnlp/DASpeech

roger-tseng/av-superb

ictnlp/DiSeg

George0828Zhang/torch_cif

dqqcasia/mosst

George0828Zhang/simulst