FT_Transformer - Attention weights

Question

FT_Transformer - Attention weights

peterlee18 opened this issue 2 years ago · 6 comments

Thanks for these models. The FT_transformer is working well. Is there a way to extract the attention weights from the model. I understand these can be used to get feature importance.

Answer 1 · 2023-04-06T01:14:59.000Z

@peterlee18 glad to hear it is working well! hope it is not too late, but you can now return all the attention maps by passing in return_attn = True on forward as so

Answer 2 · 2023-04-07T15:43:31.000Z

Thanks very much. Will give it try. By the way, the FT_Transformer architecture is much more performant (than the Tab_Transformer) when applied to insurance tubular data. Its makes NN competitive with GBMs and with different performance characteristics, it’s helpful in a simple weighted ensemble with GBMs. From: Phil Wang ***@***.***> Sent: Thursday, April 6, 2023 2:15 AM To: lucidrains/tab-transformer-pytorch ***@***.***> Cc: peterlee18 ***@***.***>; Mention ***@***.***> Subject: Re: [lucidrains/tab-transformer-pytorch] FT_Transformer - Attention weights (Issue #17) @peterlee18 <https://github.com/peterlee18> glad to hear it is working well! hope it is not too late, but you can now return all the attention maps by passing in return_attn = True on forward as so <582ebc8#diff-82fca44d3316e0c04a31bad04114cfff1a8d47b8eb12d9f7397ba7272358f72eR183> — Reply to this email directly, view it on GitHub <#17 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXTVV6TLXU726YGTVG3Z37DW7YKJ5ANCNFSM6AAAAAAVJHAUZU> . You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AXTVV6XDGXNU42X2UXLS2J3W7YKJ5A5CNFSM6AAAAAAVJHAUZWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSZJ427A.gif> Message ID: ***@***.*** ***@***.***> >

Answer 3 · 2023-04-07T16:17:49.000Z

@peterlee18 that is really interesting

if i find some more time, i'll improve on FT transformer with some of the latest findings, see if i can tip it over the edge

Answer 4 · 2023-04-07T16:19:06.000Z

@peterlee18 how much data are you working with? (if you are allowed to say)

Answer 5 · 2023-04-07T16:28:07.000Z

I’m experimenting with around 1m rows of data with around 30 features. Mixture of numerical and categorical. I have pre-converted the categorical features into embeddings other the number of dimensions gets too large. FT_Transformer is both memory and gpu intensive (that’s the down-side) so using quite small batch sizes (for insurance data use-case). Also in inference mode, I chunk up the data to get predictions out, otherwise we run out of memory. From: Phil Wang ***@***.***> Sent: Friday, April 7, 2023 5:19 PM To: lucidrains/tab-transformer-pytorch ***@***.***> Cc: peterlee18 ***@***.***>; Mention ***@***.***> Subject: Re: [lucidrains/tab-transformer-pytorch] FT_Transformer - Attention weights (Issue #17) @peterlee18 <https://github.com/peterlee18> how much data are you working with? (if you are allowed to say) — Reply to this email directly, view it on GitHub <#17 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXTVV6SRK6HVYX6WE2Y7323XAA5ALANCNFSM6AAAAAAVJHAUZU> . You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AXTVV6W62KXSEUISFNVJDB3XAA5ALA5CNFSM6AAAAAAVJHAUZWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSZN3DVM.gif> Message ID: ***@***.*** ***@***.***> >

Answer 6 · 2023-04-07T16:56:43.000Z

@peterlee18 that's really cool! at my last job i did a lot of work with GBMs (training and deploying xgboost models mainly) and they were really hard to beat

glad to hear you were able to get competitive performance using attention!