Zth9730's Stars
jasonppy/PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
google-research/perch
ivy-llc/ivy
Convert Machine Learning Code Between Frameworks
microsoft/CLAP
Learning audio concepts from natural language supervision
bojone/rerope
Rectified Rotary Position Embeddings
YuanGongND/whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
MontaEllis/Pytorch-Medical-Segmentation
This repository is an unoffical PyTorch implementation of Medical segmentation in 2D and 3D.
MLNLP-World/MyArxiv
Arxiv个性化定制化模版,实现对特定领域的相关内容、作者与学术会议的有效跟进。
deskflow/deskflow
Deskflow lets you share one mouse and keyboard between multiple computers on Windows, macOS and Linux. It's like a software KVM (but without video).
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
microsoft/torchscale
Foundation Architecture for (M)LLMs
Jamie-Stirling/RetNet
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
alinlab/ifseg
IFSeg: Image-free Semantic Segmentation via Vision-Language Model (CVPR 2023)
Long-Kai/ADV_CE
Source code for paper "Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization"
bigscience-workshop/bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
yeyupiaoling/Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
fabawi/ImageBind-LoRA
Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA
chenkui164/FastASR
这是一个用C++实现ASR推理的项目,它依赖很少,安装也很简单,推理速度很快,在树莓派4B等ARM平台也可以流畅的运行。 支持的模型是由Google的Transformer模型中优化而来,数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时), 所以识别效果也很好,可以媲美许多商用的ASR软件。
Mddct/WeUSM
milely/SRN.Pytorch
Unofficial implementation of Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
jiaaro/pydub
Manipulate audio with a simple and easy high level interface
deezer/spleeter
Deezer source separation library including pretrained models.