SpeechOceanTech

SpeechOceanTech's Stars

mayooear/gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
Language:TypeScript15k 151 2983k
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook13.6k 79 4201.3k
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language:Python10.8k 159 65841
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python7.7k 71 1.3k806
mindoc-org/mindoc
Golang实现的基于beego框架的接口在线文档管理系统
Language:Go7.4k 273 8351.9k
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Language:Python7k 128 4511k
mli/autocut
用文本编辑器剪视频
Language:Python6.9k 63 84706
llSourcell/Doctor-Dignity
Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.
Language:Python3.9k 55 27408
baaivision/Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
Language:Python2.5k 37 71176
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
1.3k 57 199141
haoheliu/voicefixer
General Speech Restoration
Language:Python1.1k 18 59132
facebookresearch/av_hubert
A self-supervised learning framework for audio-visual speech
Language:Python865 15 111138
yl4579/StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Language:Python489 22 97109
mpc001/Lipreading_using_Temporal_Convolutional_Networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
Language:Python401 8 65102
mpc001/Visual_Speech_Recognition_for_Multiple_Languages
Visual Speech Recognition for Multiple Languages
Language:Python367 13 3157
sdadas/polish-nlp-resources
Pre-trained models and language resources for Natural Language Processing in Polish
330 41 629
ksopyla/awesome-nlp-polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
294 28 134
VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)
Language:Python212 5 3652
mpc001/end-to-end-lipreading
Pytorch code for End-to-End Audiovisual Speech Recognition
Language:Python174 2 3250
tencent-ailab/3m-asr
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Language:Python118 6 516
bwang514/PerformanceNet
PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network
Language:Python109 4 412
tstafylakis/Lipreading-ResNet
Torch code for using Residual Networks with LSTMs for Lipreading
Language:Lua99 4 213
cvqluu/nn-similarity-diarization
Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")
Language:Python44 2 612
Open-Speech-EkStep/ULCA-asr-dataset-corpus
40 7 716
Strange-AI/datasets
Collections of many datasets you may need and play with.
Language:Shell32 1 16
srinivr/kaldi-long-audio-alignment
Long audio alignment using Kaldi
Language:Shell24 4 110
tomaarsen/TTSTextNormalization
Convert English text from written expressions into spoken forms
Language:Python21 5 13
VIPL-Audio-Visual-Speech-Understanding/deep-face-speechreading
Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"
Language:Python17 2 15
Oguzhanercan/Vision-Transformers
Implementations of various Vision Transformer Models and Training Strategies
Language:Jupyter Notebook3 1 00
around-star/Speech-Recognition
Speech Recognition using Recurrent Neural Network Transducer
Language:Jupyter Notebook2 1 00