Error with @ex.main and if name == 'main':

Question

Error with @ex.main and if name == 'main':

HanGuangXin opened this issue 3 years ago · 4 comments

When I use @ex.main and if __name__ == '__main__':, MongoObserver collect no data.

There is minimal code to reproduce my error:

from sacred import Experiment
from sacred.observers import MongoObserver
ex = Experiment('OBB_Swin')
ex.observers.append(MongoObserver(url='localhost:27017', db_name='OBB'))

@ex.main
def my_main():
    print('test')

if __name__ == '__main__':
    # ex.run_commandline()          # correct 
    # ex.run()                      # correct 
    my_main()

Looking forward to your reply!

Answer 1 · 2021-12-22T08:21:33.000Z

There are some reasons I can't use ex.run_commandline() and ex.run(). For ex.run_commandline(), It can't work with an existing argparse. And for ex.run(), it can't work with multiple GPU training (for example: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py)

Answer 2 · 2022-01-06T07:17:16.000Z

Hi @HanGuangXin! Happy new year! Unfortunately, you have to use ex.run (or ex.run_commandline) for everything to work. ex.run contains the code to set up the configuration and observers. @ex.main doesn't modify my_main, it just registers it as the default main function for ex.run.

For the multi-GPU training: what exactly is not working and do you know why?

Answer 3 · 2022-04-07T02:14:02.000Z

+1
Multi-GPU is used more and more frequently nowadays but does not work with sacred. Because the there are additional stuff in the command line to start python, just like what @HanGuangXin mentioned: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py

Answer 4 · 2023-02-22T22:35:32.000Z

+1
Making scared work alongside torch multiprocessing is an absolute pain.