lowintelligence

AlibabaBeijing

Pinned Repositories

abseil-cpp
Abseil Common Libraries (C++)
Language:C++0 2 00
Anakin
Language:C++0 2 00
AvxToNeon
In this project, the frequently used AVX instructions are encapsulated as independent modules to reduce repeated development workload.
Language:C0 1 00
efficient_transformer
Scaling Transformer architectures has been critical for pushing the frontiers of Language Modelling (LM), a problem central to Natural Language Processing (NLP) and Language Understanding. Although there is a direct positive relationship between the Transformer capacity and its LM performance, there are practical limitations which make training massive models impossible. These limitations come in the form of computation and memory costs which cannot be solely addressed by training on parallel devices. In this thesis, we investigate two approaches which can make Transformers more computationally and memory efficient. First, we introduce the Mixture-of-Experts (MoE) Transformer which can scale its capacity at a sub-linear computational cost. Second, we present a novel content-based sparse attention mechanism called Hierarchical Self Attention (HSA). We demonstrate that the MoE Transformer is capable of achieving lower test perplexity values than a vanilla Transformer model with higher computational demands. Language Modelling experiments, involving a Transformer which uses HSA in place of conventional attention, revealed that HSA can speed up attention computation by up to 330% at a negligible cost in model performance.
Language:Python0 1 00
jittor
Language:C++00
legion
The Legion Parallel Programming System
Language:C++0 1 00
mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Language:Python0 1 00
nccl
Optimized primitives for collective multi-GPU communication
Language:C++0 2 00
nmaker
A heterogeneous cosmological N-Body simulation code for multicore (CPU) and manycore (MIC) platforms.
Language:C0 2 01
OpenArray
Language:C++0 1 00

lowintelligence's Repositories

lowintelligence/abseil-cpp
Abseil Common Libraries (C++)
Language:C++0 2 00
lowintelligence/Anakin
Language:C++0 2 00
lowintelligence/AvxToNeon
In this project, the frequently used AVX instructions are encapsulated as independent modules to reduce repeated development workload.
Language:C0 1 00
lowintelligence/efficient_transformer
Scaling Transformer architectures has been critical for pushing the frontiers of Language Modelling (LM), a problem central to Natural Language Processing (NLP) and Language Understanding. Although there is a direct positive relationship between the Transformer capacity and its LM performance, there are practical limitations which make training massive models impossible. These limitations come in the form of computation and memory costs which cannot be solely addressed by training on parallel devices. In this thesis, we investigate two approaches which can make Transformers more computationally and memory efficient. First, we introduce the Mixture-of-Experts (MoE) Transformer which can scale its capacity at a sub-linear computational cost. Second, we present a novel content-based sparse attention mechanism called Hierarchical Self Attention (HSA). We demonstrate that the MoE Transformer is capable of achieving lower test perplexity values than a vanilla Transformer model with higher computational demands. Language Modelling experiments, involving a Transformer which uses HSA in place of conventional attention, revealed that HSA can speed up attention computation by up to 330% at a negligible cost in model performance.
Language:Python0 1 00
lowintelligence/jittor
Language:C++00
lowintelligence/legion
The Legion Parallel Programming System
Language:C++0 1 00
lowintelligence/mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Language:Python0 1 00
lowintelligence/nccl
Optimized primitives for collective multi-GPU communication
Language:C++0 2 00
lowintelligence/nmaker
A heterogeneous cosmological N-Body simulation code for multicore (CPU) and manycore (MIC) platforms.
Language:C0 2 01
lowintelligence/OpenArray
Language:C++0 1 00
lowintelligence/Paddle
PArallel Distributed Deep LEarning
Language:C++
lowintelligence/tensorflow
Computation using data flow graphs for scalable machine learning
Language:C++
lowintelligence/TensorRT
TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs and deep learning accelerators.
Language:C++1 0
lowintelligence/TePDist
Language:C++0 01
lowintelligence/transformer
A TensorFlow Implementation of the Transformer: Attention Is All You Need
lowintelligence/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs

lowintelligence

Pinned Repositories

abseil-cpp

Anakin

AvxToNeon

efficient_transformer

jittor

legion

mixture-of-experts

nccl

nmaker

OpenArray

lowintelligence's Repositories

lowintelligence/abseil-cpp

lowintelligence/Anakin

lowintelligence/AvxToNeon

lowintelligence/efficient_transformer

lowintelligence/jittor

lowintelligence/legion

lowintelligence/mixture-of-experts

lowintelligence/nccl

lowintelligence/nmaker

lowintelligence/OpenArray

lowintelligence/Paddle

lowintelligence/tensorflow

lowintelligence/TensorRT

lowintelligence/TePDist

lowintelligence/transformer

lowintelligence/vllm