Pinned Repositories
video-parser
Douyin Kuaishou Tiktok live room protocal
CRESS
Code for ACL 2023 main conference paper "Understanding and Bridging the Modality Gap for Speech Translation".
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
NPDA-KNN-ST
Official implementation of EMNLP'2022 paper "Non-Parametric Domain Adaptation for End-to-End Speech Translation"
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
FBK-fairseq
Repository containing the open source code of works published at the FBK MT unit.
CRESS
Code for ACL 2023 main conference paper "Understanding and Bridging the Modality Gap for Speech Translation".
LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
WACO
Word-Aligned Contrastive Learning for Speech Translation
Crabbit-F's Repositories
Crabbit-F/CRESS
Code for ACL 2023 main conference paper "Understanding and Bridging the Modality Gap for Speech Translation".
Crabbit-F/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"