alexarnimueller/LSTM_peptides

Unexpected behavior with boolean program arguments

rizaozcelik opened this issue · 4 comments

Hello Alex,

First and foremost, Thanks a lot for this great repository. I have been using the codebase to fine-tune a pre-trained model on a set of peptides and stumbled upon an unexpected behavior and want to bring it up. As suggested in README, I have run LSTM_peptides.py by setting train argument to False and finetune to True, but the code entered the train branch in the main function (line 727) and started pre-training instead of fine-tuning.

I have dived deeper into the code and found out that argparse module parses the value of train argument as a string, i.e. "False", which is, in turn, casted to boolean by Python and evaluated as True. Thus, the if condition in line 727 (if train:) evaluates to True as long as any non-empty string is provided; triggering the pre-training code.

To isolate and reproduce the error, I have created a small script argparse_test.py with the following content.

import argparse
flags = argparse.ArgumentParser()
flags.add_argument("-t", "--train", default=True, help="whether the network should be trained or just sampled from", type=bool)
args = flags.parse_args()
print(args.train)

When I run this script via a Ubuntu 20 terminal with python3 arparse_test.py --train False (python version is 3.8.10) the output is True. In fact, I have experimented with several values (false, None, True) for train and the code output is always True, except for python3 arparse_test.py --train '', i.e. when train is set to an empty string.

I wonder if I am missing something or the suggested fine-tuning command is supported in certain Python versions. If this is unexpected behavior (which might have been affecting many users), indeed, I'd be happy to create a pull request as a fix.

Looking forward to your reply!

Hi Riza

Great to hear you found my repo useful.
Thanks a lot for making me aware of this argparse issue. Indeed, the code as it is written now does not allow you to set the training flag to false. What actually needs to be done is to create a no-train flag.

Example:

from argparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument('--notrain', default=False, action='store_true', help='Do not train')
args = parser.parse_args()

if args.notrain:
    print("Not training")
else:
    print("Training")

If you want to apply these changes and be a contributor, I would be very happy.

Thanks in advance

ah or an actual simpler version is to just change the behavior of the if train in main to something like if train and not finetune

Do you want to adapt it or should I?

Hi Alex,

Thanks for the prompt response. I would be happy to contribute to the repository and can apply the changes. I propose to leave the flow in the main untouched and change the argument behavior since arguments are the root cause for the strange behavior.

I propose to replace boolean train and finetune flags with a training_mode flag, which woud be set to train or finetune. This should make the behavior explicit for the users.

If you are also okay with that direction, I will create a pull request for you to review.

I have created a pull request (#13) with a slightly different behavior than I described above. I implemented a mode flag that can be set to pretrain, finetune, and sample. I tried to keep the changes to the code minimum to minimize possible side effects.