Pinned Repositories
cupy
NumPy & SciPy for GPU
cutlass
CUDA Templates for Linear Algebra Subroutines
FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Neural-Net
Neural-Net-2
An optimized version of Neural Net
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rfcs
PyTorch RFCs (experimental)
Tempo
Memory footprint reduction for transformer models
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
andoorve's Repositories
andoorve/cupy
NumPy & SciPy for GPU
andoorve/cutlass
CUDA Templates for Linear Algebra Subroutines
andoorve/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
andoorve/Neural-Net
andoorve/Neural-Net-2
An optimized version of Neural Net
andoorve/neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
andoorve/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
andoorve/rfcs
PyTorch RFCs (experimental)
andoorve/Tempo
Memory footprint reduction for transformer models
andoorve/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
andoorve/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs