vyraun's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
FMInference/FlexiGen
Running large language models on a single GPU for throughput-oriented scenarios.
voxel51/fiftyone
Refine high-quality datasets and visual AI models
facebookincubator/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
jeffkaufman/icdiff
improved colored diff
OpenNMT/CTranslate2
Fast inference engine for Transformer models
alpa-projects/alpa
Training and serving large-scale neural networks with auto parallelization.
google/BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
libffcv/ffcv
FFCV: Fast Forward Computer Vision (and other ML workloads!)
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
microsoft/mup
maximal update parametrization (µP)
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
GEM-benchmark/NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
mayeaux/generate-subtitles
Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration
dustinvtran/latex-templates
A collection of LaTeX templates used for research, courses, and miscellanea.
google-research/bleurt
BLEURT is a metric for Natural Language Generation based on transfer learning.
songhwanjun/Awesome-Noisy-Labels
A Survey
microsoft/gpt-MT
facebookresearch/mlqe
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scale 1 to 100) generated though human evaluations that represent the quality of the translations.Paper Title Unsupervised Quality Estimation for Neural Machine Translation
microsoft/semantic_parsing_with_constrained_lm
Code to reproduce experiments in the paper "Constrained Language Models Yield Few-Shot Semantic Parsers" (EMNLP 2021).
unicode-cookbook/cookbook
The Unicode Cookbook for Linguists
mjpost/bin
bin files
vyraun/long-tailed
Code for "On Long-Tailed Phenomena in NMT".
srush/aima-arguments
vered1986/LM_NE_bias
Named Entity Biases in Pre-trained Language Models
vyraun/Finding-Memo
Code for "Extractive Memorization in Constrained Sequence Generation Tasks"