How Dense layers work?
miladdona opened this issue · 3 comments
Hi,
For example we have a dense layer with shape (100, 100) and we will try with shape [[2, 2, 5, 5], [2, 5, 2, 5]] and max_tt_rank=4
based on this example we have these tt_cores
(1, 2, 2, 4)
(4, 2, 5, 4)
(4, 5, 2, 4)
(4, 5, 5, 1)
- Are the tt_core always 4-D?
- How does a dense layer work? For example in SVD decomposition of a dense layer we have two thinner dense layer :
I mean for a dense layer with shape of (100, 100) and rank=20 we have two dense layers like (100, 20) and (20, 100).
I want to know how does T3F library work in this manner? - Is there a way to extract number of operation for each layer? For example for the previous example we have 100 * 100 = 10000 operations in normal way, but I can not extract number of operations in T3F library!
Thank you in advance.
Best regards,
Miladona
- Yes, tt core is always 4D for TT matrices in this Keras layer. (It's 3D for TT tensors (in cotrast to TT-matrices). Also, sometimes you can consider a batch of TT-objects which can add an extra dimension).
- It will reshape your input vector into a tensor of shape (2, 2, 5, 5) and then contract the cores one by one with the input tensor. This can be interpreted as
d
(4 in this case) weird sparse linear layers.
It looks something like this:
res = input_vector.reshape(2, 2, 5, 5)
res = einsum('aijb,qwei->qweja', tt_cores[-1], res) # equivalent to matrix multiplying 4*5x5*1 core by 5x2*5*2 input, i.e. 4*5* 5 * 2*5*2 FLOPs not counting reshapes
res = einsum('ceka,qweaj->qwkjc', tt_cores[-2], res) # equivalent to matrix multiplying 4*2x4*5 core by 4*5x2*5*2 input, i.e. 4*2* 4*5 * 2*5*2 FLOPs
res = einsum('dwlc,qwkjc->qlkjd', tt_cores[-3], res)
res = einsum('fqnd,qlkjd->nlkj', tt_cores[0], res)
res = res.reshape(2*5*2*5)
res += bias
Hi,
How do you extract these strings (in fact operations) : 'aijb,qwei->qweja'
I mean, in this example, input shape has a 4D shape and the cores are 4D tensors. If we have input in 3D or 2D or 5D shape how to define these operations?
for example if we have [[4, 5, 5], [5, 4, 5]] instead [[2, 2, 5, 5], [2, 5, 2, 5]]
Thank you in advance.
Kind regards,
Miladdona
Hi,
You mean how to read this notation or how do I come up with this particular formulas? If the former, check out some tutorial, e.g. https://rockt.github.io/2018/04/30/einsum
If the latter, then check out the Tensorizing Neural Networks [1] paper for the definition of a TT layer, formula (5).
You have input vector x which you reshape into e.g. [2, 2, 5, 5] (or [4, 5, 5]) tensor X, and then you need to do the summation w.r.t. j1, j2, j3, j4 (or j1, j2, j3 in case of [4, 5, 5]).
Also, the terms Gk[ik, jk] are themself matrices which are multiplied by each other, so you also need to sum out the intermediate dimensions (which correspond to the rank).
E.g. the first step in the pseudocode above res = einsum('aijb,qwei->qweja', tt_cores[-1], res)
does summation w.r.t. j4 (which I call j
in the einsum string). It says res[q, w, e, j, a] = sum_{j, b} Gd[i,j](a, b) x[q, w, e, i]
, where tt_cores[-1]
is Gd
and res
is x
.
Note that in these formulas we multiply X @ TTW, while in the paper we do TTW.T @ X, sorry about confusing notation.