HIN0209's Stars
lllyasviel/Fooocus
Focus on prompting and generating
shap/shap
A game theoretic approach to explain the output of any machine learning model.
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
mpc001/Visual_Speech_Recognition_for_Multiple_Languages
Visual Speech Recognition for Multiple Languages
facebookresearch/SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Srijith-rkr/Whispering-LLaMA
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
mpc001/auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
YUCHEN005/RobustGER
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
cwx-worst-one/EAT
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
roudimit/whisper-flamingo
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
ms-dot-k/Lip-to-Speech-Synthesis-in-the-Wild
PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)
ahaliassos/raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
mmorise/rohan4600
モーラバランス型日本語コーパス
PierreElias/IntroECG
Resource library for getting started with deep learning work using electrocardiograms
skit-ai/SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
IIP-Sogang/olkavs-avspeech
The Introduction of the OLKAVS Dataset
gongouveia/Whisper-Synthetic-ASR-Dataset-Generator
This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper or enhanced and custom datasets
StelaBou/voxceleb_preprocessing
Download and preprocess voxceleb datasets.
topel/audioset-convnext-inf
Adapting a ConvNeXt model to audio classification on AudioSet
Sreyan88/LipGER
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
MuSAELab/Multimodal-dataset-catalog
This repository lists publicly available datasets for visual-audio, speech and audio, and biomedical signal related tasks.
payalmohapatra/Multimodal-Speech-Disfluency
Multimodal Disfluency Detection (Interspeech 2024)
srinath-dittakavi/lip-reading
LRS3-based lip reading algorithm using Inception3D features & self-attention. Handles 4 segments of 16 frames for word prediction.
zulfiqarAlibalti/audio-visual-Transcription
Real-Time Audio-visual Speech Recongition
sindhujagopu/LIP-READING-AI
LIP READING-AI is an AI system that interprets lip movements from video to text in real-time, enhancing communication for the hearing-impaired and deaf, and improving security and interaction in noisy or masked situations. Utilizing advanced deep learning models, it offers pre-trained solutions and customizable tools for various applications.