RNN Controller is implemented differently from the original paper

In the paper, the RNN is described as: "Every prediction is carried out by a softmax classifier and then fed into the next time step as input". However, I found in the implementation, the input is the output of previous epoch rather than the output of previous time step. Actually, I am confused with the initial state of RNN. The paper didn't state what the initial state is.

Related codes:

neural-architecture-search/train.py

Lines 67 to 95 in d5f5c9d

    
           state = state_space.get_random_state_space(NUM_LAYERS) 
        
           print("Initial Random State : ", state_space.parse_state_space_list(state)) 
        
           print() 
        
           # clear the previous files 
        
           controller.remove_files() 
        
           # train for number of trails 
        
           for trial in range(MAX_TRIALS): 
        
               with policy_sess.as_default(): 
        
                   K.set_session(policy_sess) 
        
                   actions = controller.get_action(state)  # get an action for the previous state 
        
               # print the action probabilities 
        
               state_space.print_actions(actions) 
        
               print("Predicted actions : ", state_space.parse_state_space_list(actions)) 
        
               # build a model, train and get reward and accuracy from the network manager 
        
               reward, previous_acc = manager.get_rewards(model_fn, state_space.parse_state_space_list(actions)) 
        
               print("Rewards : ", reward, "Accuracy : ", previous_acc) 
        
               with policy_sess.as_default(): 
        
                   K.set_session(policy_sess) 
        
                   total_reward += reward 
        
                   print("Total reward : ", total_reward) 
        
                   # actions and states are equivalent, save the state and reward 
        
                   state = actions

The graph of the rnn is built such that in one pass, all the different classifiers will output sequentially based on each other's last state.

For the first state, it is common to use a zero state vector.

	state = state_space.get_random_state_space(NUM_LAYERS)
	print("Initial Random State : ", state_space.parse_state_space_list(state))
	print()

	# clear the previous files
	controller.remove_files()

	# train for number of trails
	for trial in range(MAX_TRIALS):
	with policy_sess.as_default():
	K.set_session(policy_sess)
	actions = controller.get_action(state) # get an action for the previous state

	# print the action probabilities
	state_space.print_actions(actions)
	print("Predicted actions : ", state_space.parse_state_space_list(actions))

	# build a model, train and get reward and accuracy from the network manager
	reward, previous_acc = manager.get_rewards(model_fn, state_space.parse_state_space_list(actions))
	print("Rewards : ", reward, "Accuracy : ", previous_acc)

	with policy_sess.as_default():
	K.set_session(policy_sess)

	total_reward += reward
	print("Total reward : ", total_reward)

	# actions and states are equivalent, save the state and reward
	state = actions