Error with @ex.main and if __name__ == '__main__':
HanGuangXin opened this issue · 4 comments
When I use @ex.main
and if __name__ == '__main__':
, MongoObserver collect no data.
There is minimal code to reproduce my error:
from sacred import Experiment
from sacred.observers import MongoObserver
ex = Experiment('OBB_Swin')
ex.observers.append(MongoObserver(url='localhost:27017', db_name='OBB'))
@ex.main
def my_main():
print('test')
if __name__ == '__main__':
# ex.run_commandline() # correct
# ex.run() # correct
my_main()
Looking forward to your reply!
There are some reasons I can't use ex.run_commandline()
and ex.run()
. For ex.run_commandline()
, It can't work with an existing argparse
. And for ex.run()
, it can't work with multiple GPU training (for example: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py
)
Hi @HanGuangXin! Happy new year! Unfortunately, you have to use ex.run
(or ex.run_commandline
) for everything to work. ex.run
contains the code to set up the configuration and observers. @ex.main
doesn't modify my_main
, it just registers it as the default main function for ex.run
.
For the multi-GPU training: what exactly is not working and do you know why?
+1
Multi-GPU is used more and more frequently nowadays but does not work with sacred. Because the there are additional stuff in the command line to start python, just like what @HanGuangXin mentioned: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py
+1
Making scared work alongside torch multiprocessing is an absolute pain.