titu1994/tf-TabNet

[Question / Not sure if it's an issue] Suggested choice of hyperparameters feat_dim (N_a) == output_dim (N_d) leads to ValueError

vrtr2020 opened this issue · 6 comments

Both in docstring of TabNet class and in the original article they suggest N_a == N_d for most datasets.
(Dimensionalities of hidden representations and the outputs of each decision step)
But in the code (tabnet.py:129) there is a ValueError which is raised if N_a <= N_d.
I'm not sure if it's an issue or it's my comprehension of the code which is not correct.
Could you please clarify this point ?

P.S.
I'd like to thank you for your implementation of a very interesting paper.
I'm trying to use tabnet module for a small POC with an imbalanced dataset containing ~20k samples, mostly categorical data.

There is a cuda bug that occurs if Nd and Na are same. The code internally takes Nd-Na dim of information for self attention, and setting them same means a (0, X) dim vector which fails on gpu.

To get true Nd = Na, make Na = 2Nd and that works.

And yes, I should update that docstring.

You should update your examples also.

Interestingly for the example train_embedding, I get much better performance when setting Na = Nd+1, than Na = 2Nd. For example: feature_dim=5, output_dim=4 (90% Accuracy rate), feature_dim=8, output_dim=4 (60% accuracy rate)

Is anyone currently working on making this clearer? I found it very confusing that the default values do not run. The actual meaning should probably be noted in the comments/docstrings.

Also, I wonder if it makes sense to allow inputs for N_a and N_d as they are treated in the paper and current doc strings, and then handle the actual dimensionality under the hood (it seems like under the hood N_a needs to be set to N_a + N_d per the original meanings).

Is anyone currently working on making this clearer? I found it very confusing that the default values do not run. The actual meaning should probably be noted in the comments/docstrings.

Also, I wonder if it makes sense to allow inputs for N_a and N_d as they are treated in the paper and current doc strings, and then handle the actual dimensionality under the hood (it seems like under the hood N_a needs to be set to N_a + N_d per the original meanings).

move this line 270 in tabnet.py: features_for_coef = transform_f4[:, self.output_dim:]
into "if" of line 272, would save the problem