Reproducing the result of hw1 problem 1(b)
Duconnor opened this issue · 2 comments
Hi there! I am trying to reproduce the result of homework 1, problem 1(b). I use the file requirements.txt to install all my dependencies. And when I ran the command:
python cs285/scripts/run_hw1_behavior_cloning.py --expert_policy_file cs285/policies/experts/HalfCheetah.pkl --env_name HalfCheetah-v2 --exp_name test_bc_hcheetah --n_iter 1 --expert_data cs285/expert_data/expert_data_HalfCheetah-v2.pkl --batch_size=1000 --eval_batch_size=5000
what I got:
Loading expert policy from... cs285/policies/experts/HalfCheetah.pkl
obs (1, 17) (1, 17)
Done restoring expert policy...
********** Iteration 0 ************
Training agent using sampled data from replay buffer...
Beginning logging procedure...
Collecting data for eval...
Eval_AverageReturn : 4.991946220397949
Eval_StdReturn : 17.147544860839844
Eval_MaxReturn : 32.29301452636719
Eval_MinReturn : -9.376068115234375
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 4.198240041732788
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...
Saving agent's actor...
So the average return of evaluation is about 4.99, which does not match the result provided in folder ./hw1/run_logs/bc_test_bc_hcheetah_HalfCheetah-v2_16-09-2019_00-58-58/. I was wondering which part I've done wrong and it would be nice if you could help me figure it out. Many thanks!
I think I forgot to push the expert_data
folder. Sorry about that!
The result should be
********** Iteration 0 ************
cs285/expert_data/expert_data_HalfCheetah-v2.pkl
envsteps this batch 0
Training agent using sampled data from replay buffer...
Beginning logging procedure...
Collecting data for eval...
Eval_AverageReturn : 2369.33544921875
Eval_StdReturn : 113.10492706298828
Eval_MaxReturn : 2578.70556640625
Eval_MinReturn : 2282.853759765625
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 3.9584085941314697
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...
@xuanlinli17 Thank you for your reply! I finally figure it out why my previous result is wrong. It seems that after I install the cs285
package via the file setup.py
, even if I change to another directory and trying to run your code, it still executes my own code. I don't know why it has this behavior, but I got everything works correctly by creating a new Conda environment and now I can get the correct result! Thanks again!