jachymuv's Stars
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
dleemiller/WordLlama
Things you can do with the token embeddings of an LLM
microsoft/factored-segmenter
Unsupervised factor-based text tokenizer for natural-language processing applications
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
rsennrich/Bleualign
Machine-Translation-based sentence alignment tool for parallel text
google/diff-match-patch
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
dmlc/decord
An efficient video loader for deep learning with smart shuffling that's super easy to digest
LinkedInLearning/transformers-text-classification-for-nlp-using-bert-2478096
This repo is for the Linkedin Learning course: Transformers: Text Classification for NLP using BERT
explosion/prodigy-recipes
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
dair-ai/ML-YouTube-Courses
📺 Discover the latest machine learning / AI courses on YouTube.
YuanGongND/ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
EGO4D/audio-visual
lkulowski/LSTM_encoder_decoder
Build a LSTM encoder-decoder using PyTorch to make sequence-to-sequence prediction for time series data
hitachi-speech/EEND
End-to-End Neural Diarization
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
xbresson/CS5242_2021
Neural Networks and Deep Learning, NUS CS5242, 2021
gcunhase/AMICorpusXML
Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus
musikalkemist/pytorchforaudio
Code for the "PyTorch for Audio + Music Processing" series on The Sound of AI YouTube channel.
google-research/leaf-audio
LEAF is a learnable alternative to audio features such as mel-filterbanks, that can be initialized as an approximation of mel-filterbanks, and then be trained for the task at hand, while using a very small number of parameters.
ai-forever/ru-gpts
Russian GPT3 models.
CloudAdvocacy/AzureMLStarter
This is some tutorial to get you started with Azure ML Service
google/model_search
leondz/hatespeechdata
Catalog of abusive language data (PLoS 2020)
pwenker/chessli
A free and open source chess improvement app that combines the power of Lichess and Anki.
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
facebookresearch/svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
google-research/robustness_metrics
sdv-dev/SDV
Synthetic data generation for tabular data