BenevolentAI/MolBERT

molbertfeaturizer with a finetuned model

Opened this issue · 1 comments

Hi,
I tried to fine tune the model100 with a dataset with the following code:

python molbert/apps/finetune.py \
    --train_file train.csv \
    --valid_file valid.csv \
    --test_file test.csv \
    --mode regression \
    --output_size 1 \
    --pretrained_model_path molbert_100epochs/checkpoints/last.ckpt \
    --label_column mylabel \
    --default_root_dir output/ \
    --num_workers 4 &> out.txt

When I try to use the finetune model, I have this error:

from molbert.utils.featurizer.molbert_featurizer import MolBertFeaturizer

mycheckpoint='MolBERT/output/lightning_logs/version_0/checkpoints/last.ckpt'
molbert = MolBertFeaturizer(mycheckpoint)

--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/cecilepereira/opt/anaconda3/envs/molbert/lib/python3.7/site-packages/pytorch_lightning/utilities/parsing.py in __getattr__(self, key)
    113         try:
--> 114             return self[key]
    115         except KeyError:

KeyError: 'named_descriptor_set'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-64-046163d76401> in <module>()
      1 from molbert.utils.featurizer.molbert_featurizer import MolBertFeaturizer
----> 2 molbert = MolBertFeaturizer(mycheckpoint)

molbert/utils/featurizer/molbert_featurizer.py in __init__(self, checkpoint_path, device, embedding_type, max_seq_len, permute)
     63         # load model
     64         self.config = Namespace(**config_dict)
---> 65         self.model = SmilesMolbertModel(self.config)
     66         self.model.load_from_checkpoint(self.checkpoint_path, hparam_overrides=self.model.__dict__)
     67 

molbert/models/base.py in __init__(self, args)
    125         self._datasets = None
    126 
--> 127         self.config = self.get_config()
    128         self.tasks = self.get_tasks(self.config)
    129         if len(self.tasks) == 0:

molbert/models/smiles.py in get_config(self)
     38                 max_position_embeddings=self.hparams.max_position_embeddings,
     39                 num_physchem_properties=self.hparams.num_physchem_properties,
---> 40                 named_descriptor_set=self.hparams.named_descriptor_set,
     41                 is_same_smiles=self.hparams.is_same_smiles,
     42             )

/Users/username/opt/anaconda3/envs/molbert/lib/python3.7/site-packages/pytorch_lightning/utilities/parsing.py in __getattr__(self, key)
    114             return self[key]
    115         except KeyError:
--> 116             raise AttributeError(f'Missing attribute "{key}"')
    117 
    118     def __setattr__(self, key, val):

AttributeError: Missing attribute "named_descriptor_set"

Could you help?

pykao commented

Hi,

I am facing the same issue. I am not able to use the fine-tuned model to generate fine-tuned MolBert features.

I have tried

  1. manually append named_descriptor_set: all in lightning_logs/version_*/hparams.yaml
  2. directly change the line 40 of MolBERT/molbert/models/smiles.py to named_descriptor_set='all',

These two ways are not able to work as well.

could someone help us to use the fine-tuned MolBert?