LSH_Attention

Description

Calculate Softmax layer of Attention in $O(L\log L)(L=sequence length)$ instead of $O(L^2)$ using the cross-polytope Locality-Sensitive Hashing. For more detail, look at this paper.

Usage

You only need numpy >=1.18 .

For example,

import numpy as np

from functions import normal_softmax, lsh_softmax

R = np.random.randn(100, 10000)
normal_sm = normal_softmax(R)
lsh_sm = lsh_softmax(R)

Test

Small size

Note: For better visibility, the diagonal components are rewritten to 0

Time complexity analysis

The execution times are plotted for sequence lengths of $2^i$($i=4, 5, \cdots , 15$).