DeNA/HandyRL

Unexpected results inference time

Muhammad4hmed opened this issue · 0 comments

Hi, I came from Kaggle, first of all, amazing work!
I was trying this for Kaggle Competition: Hungry Geese, its training fine with these params:

{'env_args': {'env': 'HungryGeese', 'source': 'handyrl.envs.kaggle.hungry_geese'}, 'train_args': {'turn_based_training': False, 'observation': False, 'gamma': 0.8, 'forward_steps': 16, 'compress_steps': 4, 'entropy_regularization': 0.1, 'entropy_regularization_decay': 0.1, 'update_episodes': 200, 'batch_size': 128, 'minimum_episodes': 400, 'maximum_episodes': 100000, 'num_batchers': 2, 'eval_rate': 0.1, 'worker': {'num_parallel': 6}, 'lambda': 0.7, 'policy_target': 'TD', 'value_target': 'TD', 'seed': 0, 'restart_epoch': 0}, 'worker_args': {'server_address': '', 'num_parallel': 8}}

and in between training, I picked an epoch model let say 17.pth
Now when trying it on kaggle enviroment (playing against another agent):

current_score = evaluate(
                "hungry_geese", 
                [
                    agents[ind_1], # HandyRL
                    agents[ind_2], 
                    "simple_toward.py", 
                    "simple_toward.py",
                ],
                num_episodes=100,
            )

the output of current_score is

[[601, None, 2704, 2604], [1503, None, 801, 1501], [501, None, 502, 502], [301, None, 2606, 2703], [901, None, 901, 903], [401, None, 1405, 1403], [601, None, 1403, 1502], [2804, None, 401, 2804], [501, None, 501, 601], [501, None, 402, 401], [702, None, 702, 702], [602, None, 502, 703], [501, None, 502, 601], [401, None, 801, 802], [501, None, 703, 701], [1002, None, 901, 501], [403, None, 401, 402], [402, None, 301, 301], [1602, None, 1602, 1202], [1301, None, 1303, 1301], [602, None, 602, 601], [1701, None, 1702, 1704], [301, None, 301, 402], [501, None, 2805, 2903], [301, None, 302, 401], [301, None, 401, 301], [0, None, 0, 201], [1702, None, 0, 1805], [1602, None, 301, 1704], [3207, None, 3205, 1801], [402, None, 301, 301], [1503, None, 501, 1502], [1202, None, 1202, 1304], [501, None, 601, 501], [902, None, 902, 901], [401, None, 401, 502], [901, None, 2304, 2202], [401, None, 301, 301], [402, None, 701, 802], [901, None, 301, 1003], [301, None, 202, 0], [702, None, 601, 601], [601, None, 1803, 1902], [301, None, 301, 402], [703, None, 301, 702], [2603, None, 602, 2704], [1301, None, 1202, 1201], [1101, None, 401, 1204], [602, None, 602, 301], [601, None, 1103, 1203], [401, None, 502, 301], [801, None, 802, 501], [601, None, 1904, 2002], [301, None, 201, 403], [403, None, 301, 301], [902, None, 901, 902], [501, None, 401, 401], [1902, None, 601, 1905], [502, None, 401, 301], [602, None, 601, 602], [2804, None, 2703, 202], [601, None, 902, 902], [3404, None, 3302, 1301], [301, None, 901, 902], [501, None, 2504, 2603], [2302, None, 2403, 801], [302, None, 502, 401], [401, None, 1803, 1704], [1304, None, 1302, 401], [201, None, 301, 202], [501, None, 501, 602], [201, None, 201, 301], [702, None, 602, 601], [401, None, 1502, 1503], [3004, None, 201, 2902], [803, None, 2102, 2103], [201, None, 302, 201], [201, None, 201, 301], [2003, None, 2203, 2103], [501, None, 501, 603], [503, None, 2002, 1903], [501, None, 502, 501], [1402, None, 401, 1403], [1302, None, 1403, 601], [2204, None, 501, 2203], [301, None, 401, 301], [1403, None, 1403, 1401], [301, None, 402, 301], [701, None, 1602, 1502], [2103, None, 1001, 2105], [401, None, 602, 501], [602, None, 501, 702], [402, None, 301, 301], [402, None, 402, 401], [601, None, 702, 602], [401, None, 401, 402], [701, None, 802, 802], [201, None, 202, 201], [702, None, 501, 601], [501, None, 501, 602]]

the second element is surprisingly "None" that is quite strange.
what do you think? what went wrong ?

Thanks