RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0
Opened this issue · 2 comments
gcassone-cnr commented
Dear Developers,
I'm a new Allegro user. I'm just trying to run the simple input shown below
# general
root: results/water-tutorial
run_name: water
seed: 42
dataset_seed: 42
append: true
default_dtype: float32
# -- network --
- allegro.model.Allegro
# the typical model builders from `nequip` can still be used:
- PerSpeciesRescale
- ForceOutput
- RescaleEnergyEtc
# cutoffs
r_max: 4.5
avg_num_neighbors: auto
# radial basis
BesselBasis_trainable: true
PolynomialCutoff_p: 48
# symmetry
l_max: 2
parity: o3_full
# Allegro layers:
num_layers: 2
env_embed_multiplicity: 8
embed_initial_edge: true
two_body_latent_mlp_latent_dimensions: [32, 64, 128]
two_body_latent_mlp_nonlinearity: silu
two_body_latent_mlp_initialization: uniform
latent_mlp_latent_dimensions: [128]
latent_mlp_nonlinearity: silu
latent_mlp_initialization: uniform
latent_resnet: true
env_embed_mlp_latent_dimensions: []
env_embed_mlp_nonlinearity: null
env_embed_mlp_initialization: uniform
# - end allegro layers -
# Final MLP to go from Allegro latent space to edge energies:
edge_eng_mlp_latent_dimensions: [32]
edge_eng_mlp_nonlinearity: null
edge_eng_mlp_initialization: uniform
- user_label
user_label: label0
# -- data --
dataset: ase
dataset_file_name: /content/cp2k/colab/AIMD_data/conc_wat_pos_frc.extxyz # path to data set file
format: extxyz
# A mapping of chemical species to type indexes is necessary if the dataset is provided with atomic numbers instead of type indexes.
- H
- O
# logging
wandb: false
#wandb_project: allegro-water-tutorial
verbose: info
log_batch_freq: 10
# training
n_train: 1000
n_val: 100
batch_size: 5
max_epochs: 100
learning_rate: 0.002
train_val_split: random
shuffle: true
metrics_key: validation_loss
# use an exponential moving average of the weights
use_ema: true
ema_decay: 0.99
ema_use_num_updates: true
# loss function
forces: 1.
- 1.
- PerAtomMSELoss
# optimizer
optimizer_name: Adam
amsgrad: false
betas: !!python/tuple
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0.
- - forces # key
- mae # "rmse" or "mae"
- - forces
- rmse
- - total_energy
- mae
- - total_energy
- mae
- PerAtom: True # if true, energy is normalized by the number of atoms
# lr scheduler, drop lr if no improvement for 50 epochs
lr_scheduler_name: ReduceLROnPlateau
lr_scheduler_patience: 50
lr_scheduler_factor: 0.5
LR: 1.0e-5
validation_loss: 100
but at the 10th epoch I get the following error:
Traceback (most recent call last):
File "/home/user/anaconda3/bin/nequip-train", line 8, in <module>
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/scripts/", line 115, in main
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/", line 784, in train
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/", line 919, in epoch_step
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/", line 814, in batch_step
out = self.model(data_for_loss)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/", line 112, in forward
data = self.model(new_data)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/", line 144, in forward
data = self.model(data)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/", line 85, in forward
data = self.func(data)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/", line 366, in forward
input = module(input)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.12/site-packages/allegro/nn/", line 612, in forward
new_latents = cutoff_coeffs[active_edges].unsqueeze(-1) * new_latents
RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0
Can you please suggest me what's wrong in my installation and how to fix this issue?
Many thanks in advance and best wishes,
Giuseppe Cassone
gcassone-cnr commented
Dear developers, Is this forum still active?