RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0

Question

RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0

Opened this issue 2 months ago · 2 comments

Dear Developers,

I'm a new Allegro user. I'm just trying to run the simple input shown below

*************
# general
root: results/water-tutorial
run_name: water
seed: 42
dataset_seed: 42
append: true
default_dtype: float32

# -- network --
model_builders:
 - allegro.model.Allegro
 # the typical model builders from `nequip` can still be used:
 - PerSpeciesRescale
 - ForceOutput
 - RescaleEnergyEtc

# cutoffs
r_max: 4.5
avg_num_neighbors: auto

# radial basis
BesselBasis_trainable: true
PolynomialCutoff_p: 48

# symmetry
l_max: 2
parity: o3_full   

# Allegro layers:
num_layers: 2
env_embed_multiplicity: 8
embed_initial_edge: true

two_body_latent_mlp_latent_dimensions: [32, 64, 128]
two_body_latent_mlp_nonlinearity: silu
two_body_latent_mlp_initialization: uniform

latent_mlp_latent_dimensions: [128]
latent_mlp_nonlinearity: silu
latent_mlp_initialization: uniform
latent_resnet: true

env_embed_mlp_latent_dimensions: []
env_embed_mlp_nonlinearity: null
env_embed_mlp_initialization: uniform

# - end allegro layers -

# Final MLP to go from Allegro latent space to edge energies:
edge_eng_mlp_latent_dimensions: [32]
edge_eng_mlp_nonlinearity: null
edge_eng_mlp_initialization: uniform

include_keys:
  - user_label
key_mapping:
  user_label: label0

# -- data --
dataset: ase                                                                   
dataset_file_name: /content/cp2k/colab/AIMD_data/conc_wat_pos_frc.extxyz                     # path to data set file
ase_args:
  format: extxyz

# A mapping of chemical species to type indexes is necessary if the dataset is provided with atomic numbers instead of type indexes.
chemical_symbols:
  - H
  - O

# logging
wandb: false
#wandb_project: allegro-water-tutorial
verbose: info
log_batch_freq: 10

# training
n_train: 1000
n_val: 100
batch_size: 5
max_epochs: 100
learning_rate: 0.002
train_val_split: random
shuffle: true
metrics_key: validation_loss

# use an exponential moving average of the weights
use_ema: true
ema_decay: 0.99
ema_use_num_updates: true

# loss function
loss_coeffs:
  forces: 1.
  total_energy:
    - 1.
    - PerAtomMSELoss

# optimizer
optimizer_name: Adam
optimizer_params:
  amsgrad: false
  betas: !!python/tuple
  - 0.9
  - 0.999
  eps: 1.0e-08
  weight_decay: 0.

metrics_components:
  - - forces                               # key 
    - mae                                  # "rmse" or "mae"
  - - forces
    - rmse
  - - total_energy
    - mae    
  - - total_energy
    - mae
    - PerAtom: True                        # if true, energy is normalized by the number of atoms

# lr scheduler, drop lr if no improvement for 50 epochs
lr_scheduler_name: ReduceLROnPlateau
lr_scheduler_patience: 50
lr_scheduler_factor: 0.5

early_stopping_lower_bounds:
  LR: 1.0e-5

early_stopping_patiences:
  validation_loss: 100
********

but at the 10th epoch I get the following error:

Traceback (most recent call last):
  File "/home/user/anaconda3/bin/nequip-train", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/scripts/train.py", line 115, in main
    trainer.train()
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 784, in train
    self.epoch_step()
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 919, in epoch_step
    self.batch_step(
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 814, in batch_step
    out = self.model(data_for_loss)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_graph_model.py", line 112, in forward
    data = self.model(new_data)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_rescale.py", line 144, in forward
    data = self.model(data)
           ^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_grad_output.py", line 85, in forward
    data = self.func(data)
           ^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_graph_mixin.py", line 366, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/allegro/nn/_allegro.py", line 612, in forward
    new_latents = cutoff_coeffs[active_edges].unsqueeze(-1) * new_latents
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0

Can you please suggest me what's wrong in my installation and how to fix this issue?

Many thanks in advance and best wishes,
Giuseppe Cassone

Answer 1 · 2024-11-04T05:51:40.000Z

Dear developers, Is this forum still active?

Answer 2 · 2024-11-22T03:14:34.000Z

Hi Giuseppe,