Drawbacks of conditional BRUNO compared to RNN BRUNO (not a bug)

Hello, thank you for open-sourcing the code! I have a few high-level questions about the models:

1. Why is validation only done for RNN and not conditional?

In the original RNN version, there is validation done during training:

Lines 201 to 208 in c631d3d

    
           if hasattr(config, 'validate_every') and (iteration + 1) % config.validate_every == 0: 
        
               print('\n Validating ...') 
        
               losses = [] 
        
               rng = np.random.RandomState(42) 
        
               for _, x_valid_batch in zip(range(0, config.n_valid_batches), 
        
                                           config.valid_data_iter.generate(rng)): 
        
                   feed_dict = {x_in_eval: x_valid_batch} 
        
                   l = sess.run([eval_loss], feed_dict)

Whereas in the conditional version, eval_loss is never used:

bruno/config_conditional/train.py

Lines 79 to 84 in c631d3d

    
           # evaluation in case we want to validate 
        
           x_in_eval = tf.placeholder(tf.float32, shape=(config.batch_size,) + config.obs_shape) 
        
           y_in_eval = tf.placeholder(tf.float32, shape=(config.batch_size,) + config.label_shape) 
        
           log_probs = model(x_in_eval, y_in_eval)[0] 
        
           eval_loss = config.eval_loss(log_probs) if hasattr(config, 'eval_loss') else config.loss(log_probs)

2. Is conditional BRUNO not maximizing joint (conditional) log likelihood?

BRUNO is clearly maximizing the joint log likelihood:

However, conditional BRUNO does not seem to be maximizing the joint conditional log likelihood... or is it?

3. "Conditional de Finetti" is not guaranteed

Do you think this is this a problem, or not really since in practice it works nonetheless?

Thank you very much!

Hello! Thank you for the questions!
Q1: the conditional code was hastily written, so it might be missing some pieces. I also didn't look much at the validation scores as far as I remember.
Q2: I think it does maximize the conditional joint log-likelihood.
Q3: From a theory point of view, it's probably unsatisfactory that there is no proof. I would be happier if there was one. Though, if the goal is to make a working model like with all deep learning, then I guess it's fine.

	if hasattr(config, 'validate_every') and (iteration + 1) % config.validate_every == 0:
	print('\n Validating ...')
	losses = []
	rng = np.random.RandomState(42)
	for _, x_valid_batch in zip(range(0, config.n_valid_batches),
	config.valid_data_iter.generate(rng)):
	feed_dict = {x_in_eval: x_valid_batch}
	l = sess.run([eval_loss], feed_dict)

	# evaluation in case we want to validate
	x_in_eval = tf.placeholder(tf.float32, shape=(config.batch_size,) + config.obs_shape)
	y_in_eval = tf.placeholder(tf.float32, shape=(config.batch_size,) + config.label_shape)
	log_probs = model(x_in_eval, y_in_eval)[0]
	eval_loss = config.eval_loss(log_probs) if hasattr(config, 'eval_loss') else config.loss(log_probs)