mir-group/allegro

RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0

Opened this issue · 2 comments

Dear Developers,

I'm a new Allegro user. I'm just trying to run the simple input shown below

*************
# general
root: results/water-tutorial
run_name: water
seed: 42
dataset_seed: 42
append: true
default_dtype: float32

# -- network --
model_builders:
 - allegro.model.Allegro
 # the typical model builders from `nequip` can still be used:
 - PerSpeciesRescale
 - ForceOutput
 - RescaleEnergyEtc

# cutoffs
r_max: 4.5
avg_num_neighbors: auto

# radial basis
BesselBasis_trainable: true
PolynomialCutoff_p: 48

# symmetry
l_max: 2
parity: o3_full   

# Allegro layers:
num_layers: 2
env_embed_multiplicity: 8
embed_initial_edge: true

two_body_latent_mlp_latent_dimensions: [32, 64, 128]
two_body_latent_mlp_nonlinearity: silu
two_body_latent_mlp_initialization: uniform

latent_mlp_latent_dimensions: [128]
latent_mlp_nonlinearity: silu
latent_mlp_initialization: uniform
latent_resnet: true

env_embed_mlp_latent_dimensions: []
env_embed_mlp_nonlinearity: null
env_embed_mlp_initialization: uniform

# - end allegro layers -

# Final MLP to go from Allegro latent space to edge energies:
edge_eng_mlp_latent_dimensions: [32]
edge_eng_mlp_nonlinearity: null
edge_eng_mlp_initialization: uniform

include_keys:
  - user_label
key_mapping:
  user_label: label0

# -- data --
dataset: ase                                                                   
dataset_file_name: /content/cp2k/colab/AIMD_data/conc_wat_pos_frc.extxyz                     # path to data set file
ase_args:
  format: extxyz

# A mapping of chemical species to type indexes is necessary if the dataset is provided with atomic numbers instead of type indexes.
chemical_symbols:
  - H
  - O

# logging
wandb: false
#wandb_project: allegro-water-tutorial
verbose: info
log_batch_freq: 10

# training
n_train: 1000
n_val: 100
batch_size: 5
max_epochs: 100
learning_rate: 0.002
train_val_split: random
shuffle: true
metrics_key: validation_loss

# use an exponential moving average of the weights
use_ema: true
ema_decay: 0.99
ema_use_num_updates: true

# loss function
loss_coeffs:
  forces: 1.
  total_energy:
    - 1.
    - PerAtomMSELoss

# optimizer
optimizer_name: Adam
optimizer_params:
  amsgrad: false
  betas: !!python/tuple
  - 0.9
  - 0.999
  eps: 1.0e-08
  weight_decay: 0.

metrics_components:
  - - forces                               # key 
    - mae                                  # "rmse" or "mae"
  - - forces
    - rmse
  - - total_energy
    - mae    
  - - total_energy
    - mae
    - PerAtom: True                        # if true, energy is normalized by the number of atoms

# lr scheduler, drop lr if no improvement for 50 epochs
lr_scheduler_name: ReduceLROnPlateau
lr_scheduler_patience: 50
lr_scheduler_factor: 0.5

early_stopping_lower_bounds:
  LR: 1.0e-5

early_stopping_patiences:
  validation_loss: 100
********

but at the 10th epoch I get the following error:

Traceback (most recent call last):
  File "/home/user/anaconda3/bin/nequip-train", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/scripts/train.py", line 115, in main
    trainer.train()
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 784, in train
    self.epoch_step()
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 919, in epoch_step
    self.batch_step(
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/train/trainer.py", line 814, in batch_step
    out = self.model(data_for_loss)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_graph_model.py", line 112, in forward
    data = self.model(new_data)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_rescale.py", line 144, in forward
    data = self.model(data)
           ^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_grad_output.py", line 85, in forward
    data = self.func(data)
           ^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/nequip/nn/_graph_mixin.py", line 366, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/lib/python3.12/site-packages/allegro/nn/_allegro.py", line 612, in forward
    new_latents = cutoff_coeffs[active_edges].unsqueeze(-1) * new_latents
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: The size of tensor a (18294) must match the size of tensor b (18293) at non-singleton dimension 0

Can you please suggest me what's wrong in my installation and how to fix this issue?

Many thanks in advance and best wishes,
Giuseppe Cassone

Dear developers, Is this forum still active?

Hi Giuseppe,

Related to my answer here -- #111

If it's not too much trouble, I would advise using the new code, which are in the develop branches of both nequip and allegro.