mcognetta's Stars
tunib-ai/parallelformers
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
mlcommons/algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
Liuhong99/Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
cp-algorithms/cp-algorithms
Algorithm and data structure articles for https://cp-algorithms.com (based on http://e-maxx.ru)
libprima/prima
PRIMA is a package for solving general nonlinear optimization problems without using derivatives. It provides the reference implementation for Powell's derivative-free optimization methods, i.e., COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA. PRIMA means Reference Implementation for Powell's methods with Modernization and Amelioration, P for Powell.
IntelLabs/academic-budget-bert
Repository containing code for "How to Train BERT with an Academic Budget" paper
e9t/nsmc
Naver sentiment movie corpus
Tiiiger/QPyTorch
Low Precision Arithmetic Simulation in PyTorch
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
google-research/jaxpruner
triton-lang/triton
Development repository for the Triton language and compiler
quantumaikr/KoreanLM
한국어 언어모델 오픈소스
JuliaSIMD/VectorizedRNG.jl
Vectorized uniform and normal random samplers.
AshwinDeshpande96/Hierarchical-Softmax
This is a scalable hierarchical softmax layer for Neural Networks with large output classes.
harrisonvanderbyl/rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependencies
omlins/julia-gpu-course
GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
SymbolicML/DynamicExpressions.jl
Ridiculously fast symbolic expressions
ggerganov/llama.cpp
LLM inference in C/C++
kakaobrain/jejueo
Jejueo Datasets for Machine Translation and Speech Synthesis
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
KanjiVG/kanjivg
Kanji vector graphics
sagemath/sage
Main repository of SageMath
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
aojunzz/NM-sparsity
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
tysam-code/hlb-CIFAR10
Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)
scandum/quadsort
Quadsort is a branchless stable adaptive mergesort faster than quicksort.
BYVoid/OpenCC
Conversion between Traditional and Simplified Chinese
FluxML/FastAI.jl
Repository of best practices for deep learning in Julia, inspired by fastai