IDSIA/sacred

Error with @ex.main and if __name__ == '__main__':

HanGuangXin opened this issue · 4 comments

When I use @ex.main and if __name__ == '__main__':, MongoObserver collect no data.

There is minimal code to reproduce my error:

from sacred import Experiment
from sacred.observers import MongoObserver
ex = Experiment('OBB_Swin')
ex.observers.append(MongoObserver(url='localhost:27017', db_name='OBB'))

@ex.main
def my_main():
    print('test')

if __name__ == '__main__':
    # ex.run_commandline()          # correct 
    # ex.run()                      # correct 
    my_main()

Looking forward to your reply!

There are some reasons I can't use ex.run_commandline() and ex.run(). For ex.run_commandline(), It can't work with an existing argparse. And for ex.run(), it can't work with multiple GPU training (for example: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py)

Hi @HanGuangXin! Happy new year! Unfortunately, you have to use ex.run (or ex.run_commandline) for everything to work. ex.run contains the code to set up the configuration and observers. @ex.main doesn't modify my_main, it just registers it as the default main function for ex.run.

For the multi-GPU training: what exactly is not working and do you know why?

+1
Multi-GPU is used more and more frequently nowadays but does not work with sacred. Because the there are additional stuff in the command line to start python, just like what @HanGuangXin mentioned: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py

BDHU commented

+1
Making scared work alongside torch multiprocessing is an absolute pain.