gugarosa/learnergy

is a label required for dataset fed into GaussianRBM

Closed this issue · 3 comments

Template when making a new issue

Please, make sure that the following boxes are checked before submitting a new issue. There is a small chance that you can solve it by your own or even that it was already addressed by someone.

Thank you!

Pre-checkings

  • [x ] Check that you are up-to-date with the master branch of Learnergy. You can update with:
    pip install git+git://github.com/gugarosa/learnergy.git --upgrade --no-deps

  • [ x] Check that you have read all of our README.

Description

I fed a train Torch Dataset into GaussianRBM, which is derived from a standard DataFrame of real numbers. each dataset.getitem() returns a instance with shape torch.Size([1, 165])
The model.fit() call blows up with
KeyError: 0
I checked your example gaussian_rbm_training.py. There is one notable difference which is you used MNIST sample dataset, which has a label. However, in theory, RBM does not require a label to train.

Can you tell what is wrong with my data, and is the label field really what's causing the error?

Steps to Reproduce

  1. create a pandas DataFrame
  2. split into train, test
  3. wrote a custom torch Dataset to return Torch tensor, one sample at a time
  4. instantiate GaussianRBM
  5. call model.fit()
  6. get error KeyError: 0

Expected behavior: [It should be what you expect to happen]
expect it to just train through all the epochs
Actual behavior: [What actually happens]
throws a key error: 0
Reproduces how often: [How much does it occur? Show us your percentage]
everytime

Additional Information

suspect a label field is needed for the input dataset. But don't understand why if that is the case.

Hello @bhomass , I hope is everything well with you.

Regarding your problem, I believe that putting a torch.ones((len(dataframe), 1)) on the torch dataset creation will be sufficient to cope with this issue.
By default, our torch Dataset process "x", "y", and "transform", as well as, our RBM batch iterator too. Since the provided custom dataset does not have this format/data, such for loop may crash.

Please, test passing (x, y, transform) to the torch dataset, and feed it to the .fit() to see if the problem is solved. If not, we can examine your data processing/procedure deeper.

Thanks,
Mateus.

Thanks Mateus. I did get it going by playing with the numpy dimensions. I am curious though, what is "transform" in your comment? I couldn't find that anywhere.

Hello Bruce.

When I said "transform" it represents the operations applied to the data when we load the samples, for instance:
transform=torchvision.transforms.ToTensor() # here we transform the numpy array to a tensor, and if the sample is a torch tensor the transform just clip the input in [0,1] interval.

We can set several tranformations to the Dataset, as follows:
transformations = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=(0.131), std=(0.308)),
])

Then, we pass it to our Dataset class:
train = Dataset(x_train, torch.ones(len(x_train)), transform=transformations)
and:
mse = model.fit(train, batch_size=batch_size, epochs=epochs)

Best wishes,
Mateus.