Pinned Repositories
av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
AVSR
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition" (Interspeech 2022)
Image-to-Speech
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
Lip-to-Speech-Synthesis-in-the-Wild
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
LRW_ID
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV 2022)
Multi-head-Visual-Audio-Memory
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
TMT
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
User-dependent-Padding
PyTorch implementation of "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV2022)
Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
Visual-Context-Attentional-GAN
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
ms-dot-k's Repositories
ms-dot-k/Lip-to-Speech-Synthesis-in-the-Wild
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
ms-dot-k/Multi-head-Visual-Audio-Memory
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
ms-dot-k/Visual-Context-Attentional-GAN
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
ms-dot-k/Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
ms-dot-k/TMT
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
ms-dot-k/AVSR
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition" (Interspeech 2022)
ms-dot-k/Image-to-Speech
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
ms-dot-k/LRW_ID
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV 2022)
ms-dot-k/User-dependent-Padding
PyTorch implementation of "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV2022)
ms-dot-k/av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
ms-dot-k/espnet
End-to-End Speech Processing Toolkit
ms-dot-k/Image-to-Speech-Captioning
Demo page for the paper "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
ms-dot-k/lip2speech-unit
Intelligible Lip-to-Speech Synthesis with Speech Units (Interspeech 2023)
ms-dot-k/utut
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation