ms-dot-k

Pinned Repositories

av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Language:Python1 0 00
AVSR
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition" (Interspeech 2022)
Language:Python12 2 10
Image-to-Speech
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
Language:Python11 2 00
Lip-to-Speech-Synthesis-in-the-Wild
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
Language:Python62 4 117
LRW_ID
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV 2022)
10 1 01
Multi-head-Visual-Audio-Memory
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
Language:Python24 1 65
TMT
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Language:Jupyter Notebook14 5 10
User-dependent-Padding
PyTorch implementation of "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV2022)
Language:Python6 1 11
Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
Language:Python19 1 54
Visual-Context-Attentional-GAN
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
Language:Python21 1 65

ms-dot-k's Repositories

ms-dot-k/Lip-to-Speech-Synthesis-in-the-Wild
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
Language:Python62 4 117
ms-dot-k/Multi-head-Visual-Audio-Memory
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)
Language:Python24 1 65
ms-dot-k/Visual-Context-Attentional-GAN
PyTorch implementation of "Lip to Speech Synthesis with Visual Context Attentional GAN" (NeurIPS2021)
Language:Python21 1 65
ms-dot-k/Visual-Audio-Memory
PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)
Language:Python19 1 54
ms-dot-k/TMT
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Language:Jupyter Notebook14 5 10
ms-dot-k/AVSR
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition" (Interspeech 2022)
Language:Python12 2 10
ms-dot-k/Image-to-Speech
Pytorch implementation of "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
Language:Python11 2 00
ms-dot-k/LRW_ID
The speaker-labeled information of LRW dataset, which is the outcome of the paper "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV 2022)
10 1 01
ms-dot-k/User-dependent-Padding
PyTorch implementation of "Speaker-adaptive Lip Reading with User-dependent Padding" (ECCV2022)
Language:Python6 1 11
ms-dot-k/av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Language:Python1 0 00
ms-dot-k/espnet
End-to-End Speech Processing Toolkit
Language:Python1 0 00
ms-dot-k/Image-to-Speech-Captioning
Demo page for the paper "Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens"
Language:HTML1 1 0
ms-dot-k/lip2speech-unit
Intelligible Lip-to-Speech Synthesis with Speech Units (Interspeech 2023)
Language:Python1 0 0
ms-dot-k/utut
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Language:Python1 0 0