/LSH_Attention

Calculate Softmax layer of Attention in O(LlogL)(L=sequence length) instead of O(L^2) using crpss-polytope Locality-Sensitive Hashing(https://arxiv.org/abs/1802.05751 ).

Primary LanguagePythonMIT LicenseMIT

LSH_Attention

Description

Calculate Softmax layer of Attention in $O(L\log L)(L=sequence length)$ instead of $O(L^2)$ using the cross-polytope Locality-Sensitive Hashing. For more detail, look at this paper.

Usage

You only need numpy >=1.18 .

For example,

import numpy as np

from functions import normal_softmax, lsh_softmax

R = np.random.randn(100, 10000)
normal_sm = normal_softmax(R)
lsh_sm = lsh_softmax(R)

Test

Small size

Note: For better visibility, the diagonal components are rewritten to 0

test

Time complexity analysis

The execution times are plotted for sequence lengths of $2^i$($i=4, 5, \cdots , 15$).

time_analysis_log