SpeechOceanTech's Stars
mayooear/gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
mindoc-org/mindoc
Golang实现的基于beego框架的接口在线文档管理系统
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
mli/autocut
用文本编辑器剪视频
llSourcell/Doctor-Dignity
Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.
baaivision/Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
haoheliu/voicefixer
General Speech Restoration
facebookresearch/av_hubert
A self-supervised learning framework for audio-visual speech
yl4579/StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
mpc001/Lipreading_using_Temporal_Convolutional_Networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
mpc001/Visual_Speech_Recognition_for_Multiple_Languages
Visual Speech Recognition for Multiple Languages
sdadas/polish-nlp-resources
Pre-trained models and language resources for Natural Language Processing in Polish
ksopyla/awesome-nlp-polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)
mpc001/end-to-end-lipreading
Pytorch code for End-to-End Audiovisual Speech Recognition
tencent-ailab/3m-asr
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
bwang514/PerformanceNet
PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network
tstafylakis/Lipreading-ResNet
Torch code for using Residual Networks with LSTMs for Lipreading
cvqluu/nn-similarity-diarization
Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")
Open-Speech-EkStep/ULCA-asr-dataset-corpus
Strange-AI/datasets
Collections of many datasets you may need and play with.
srinivr/kaldi-long-audio-alignment
Long audio alignment using Kaldi
tomaarsen/TTSTextNormalization
Convert English text from written expressions into spoken forms
VIPL-Audio-Visual-Speech-Understanding/deep-face-speechreading
Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"
Oguzhanercan/Vision-Transformers
Implementations of various Vision Transformer Models and Training Strategies
around-star/Speech-Recognition
Speech Recognition using Recurrent Neural Network Transducer