question about test

Hi,
Apologize me if the question is a little dumb. But I can't figure out what's going on in test.py. Is there any learning phase in it? If not how can I test gradient update and if so where does model learn?

MultiTaskSampler, which is responsible for sampling the trajectories, is doing adaptation locally in each worker.

pytorch-maml-rl/maml_rl/samplers/multi_task_sampler.py

Lines 251 to 275 in 0c2c7dd

    
           # Sample the training trajectories with the initial policy and adapt the 
        
           # policy to the task, based on the REINFORCE loss computed on the 
        
           # training trajectories. The gradient update in the fast adaptation uses 
        
           # `first_order=True` no matter if the second order version of MAML is 
        
           # applied since this is only used for sampling trajectories, and not 
        
           # for optimization. 
        
           params = None 
        
           for step in range(num_steps): 
        
               train_episodes = self.create_episodes(params=params, 
        
                                                     gamma=gamma, 
        
                                                     gae_lambda=gae_lambda, 
        
                                                     device=device) 
        
               train_episodes.log('_enqueueAt', datetime.now(timezone.utc)) 
        
               # QKFIX: Deep copy the episodes before sending them to their 
        
               # respective queues, to avoid a race condition. This issue would  
        
               # cause the policy pi = policy(observations) to be miscomputed for 
        
               # some timesteps, which in turns makes the loss explode. 
        
               self.train_queue.put((index, step, deepcopy(train_episodes))) 
        
               with self.policy_lock: 
        
                   loss = reinforce_loss(self.policy, train_episodes, params=params) 
        
                   params = self.policy.update_params(loss, 
        
                                                      params=params, 
        
                                                      step_size=fast_lr, 
        
                                                      first_order=True)

So in test.py, you do get both trajectories before and after adaptation with the simple call to MultiTaskSampler. And with a few changes to test.py you can even use different number of gradient steps for adaptation by changing num_steps in your call to sampler.sample().

Thanks, That was really helpful.

Sorry for opening this issue again but after changing num_steps I didn't get better results!!

(number near to MAML show num-batches)

What is the environment? Making sure you get better performance with a larger number of gradient steps at test time is not something I tested.

Sorry for bothering you. It was my mistake. I found out if I lower the learning rate at both test and train time I can get better performance. (my environment is half_cheetah_vel)

	# Sample the training trajectories with the initial policy and adapt the
	# policy to the task, based on the REINFORCE loss computed on the
	# training trajectories. The gradient update in the fast adaptation uses
	# `first_order=True` no matter if the second order version of MAML is
	# applied since this is only used for sampling trajectories, and not
	# for optimization.
	params = None
	for step in range(num_steps):
	train_episodes = self.create_episodes(params=params,
	gamma=gamma,
	gae_lambda=gae_lambda,
	device=device)
	train_episodes.log('_enqueueAt', datetime.now(timezone.utc))
	# QKFIX: Deep copy the episodes before sending them to their
	# respective queues, to avoid a race condition. This issue would
	# cause the policy pi = policy(observations) to be miscomputed for
	# some timesteps, which in turns makes the loss explode.
	self.train_queue.put((index, step, deepcopy(train_episodes)))

	with self.policy_lock:
	loss = reinforce_loss(self.policy, train_episodes, params=params)
	params = self.policy.update_params(loss,
	params=params,
	step_size=fast_lr,
	first_order=True)