Interface Consistencies
Opened this issue · 1 comments
I know that Reinforce.jl is not trying to emulate OpenAI gym exactly, but I'm curious behind the reasoning to a couple interface decisions that seem inconsistent with gym's.
First, why doesn't reset!(env)
return a state or observation for convenience? From personal experience, when I was using OpenAIGym.jl, reset!(env)
was always returning false
. This was happening because julia returns the variable on the last line of the function by default, which happened to come from env.done=false
. I had to look through the source code to figure out what was happening. Returning a state/observation would be consistent with gym, and would avoid any confusion for new users.
Second, why does step!(env, s, a)
return r, s'
instead of s',r
? This is a minor difference in ordering, but once again, I had an expectation for what step!
should return from gym.
And why does step! take a state? Shouldn't that be stored in the env? In CartPole one of the first things the method does is overwrite the state which was handed in with the state from the environment...