StepNeverStop/RLs

error when checking the length of shape tf 2.0

kmakeev opened this issue · 10 comments

tf.version
'2.0.0'
tfp.version
'0.8.0'

params --gym -a sac_no_v -n train_using_gym -g --gym-env CarRacing-v0 --render-episode 10 --gym-agents 4

ER
in converted code:
relative to C:\Python34\RLs\Nn:

tf2nn.py:144 call  *
    features = self.share(super().call(vector_input, visual_input))
tf2nn.py:86 call  *
    features = self.conv1(visual_input)

AttributeError: 'actor_continuous' object has no attribute 'conv1'

in tf2nn.py class ImageNet(tf.keras.Model): init()
len(visual_dim) is '4'
and conv1 layers are not added to the model, etc. since 'if len(visual_dim) == 5:'
in 'def call(self, vector_input, visual_input):' shape is (None, 1, 96, 96, 3), him len is '5'
and here we get an error
if visual_input is None or len(visual_input.shape) != 5:
pass
else:
features = self.conv1(visual_input)

please, check also in gym_loop.py
def init_variables(env, action_type):
"""
inputs:
env: Environment
action_type: discrete or continuous
outputs:
i: specify which item of state should be modified
mu: action bias
sigma: action scale
state: [vector_obs, visual_obs]
newstate: [vector_obs, visual_obs]
"""
i = 1 if len(env.observation_space.shape) == 3 else 0
mu, sigma = get_action_normalize_factor(env.action_space, action_type)
return i, mu, sigma, [np.empty(env.n), np.array([[]] * env.n)], [np.empty(env.n), np.array([[]] * env.n)]

returns 'state' witch shape <class 'tuple'>: (4, 1, 210, 160, 3)
I think it should be (4, 210, 160, 3)

Hi,
The problem you found is really a serious BUG, I have fixed it basicly in the lastest commit. Now it work for Adventure-v0, Berzerk-v0, etc... which has visual observation, but still not work for CarRacing-v0. Don't know why, maybe this environment doesn't support multi-threading very well.

You can try other visual based env for training, I will continue to test CarRacing-v0 to see whether it could work or not.

And, due to the need to be compatible with the Unity environment(It may have multiple image input sources). So I have to use Conv3D instead of Conv2D, that's why I updimension the visual observation for gym env. I am planing to writer another pure-gym env training project, so that will look less redundant.

thx.

you could try to training atari games.

Thank!
I will try.

Sorry!
Breakout-v0 still doesn't work. Even worse. After launch, there is a lot of memory consumption and the application crashes.
I see the change in 'atary_loop.py'
def init_variables (env, action_type):
...
did not affect the shape of the returned values.

Let me draw your attention to the following things:

  1. It is better to store a state map for models working with pictures in Int8, (these are colors in values up to 255) This greatly saves memory.
    As a sample, you can see this project https://github.com/fg91/Deep-Q-Learning

  2. For class model ImageNet (tf.keras.Model): I have not seen if there is a normalization of the input data (/ 255)

  3. Many Atari games (including Breakout-v0) do not start without receiving the fire action. And their training will not be successful.

@kasimte Hi,

python run.py --gym --gym-env Breakout-v0 -a dqn --render-episode 0 work for me.

There is lots of memory consumption because of:

  1. float32, not int8. this will be optimize later, and normalization will also be implemented later, either for vector input and image input.

  2. conv3d, not con2d. conv3D has more variables, so its optimization will have more time consuming. you know, it's really hard to be compatible with Unity ML-agens, so I will write another function to deal with Gym visual input later.

image

you said the application has crashed, I don't know whether it just breaked and shut down or just got stuck. If the application get stuck, that's normal, because off-policy algorithms will be learned a lot of times, like dqn, dueling dqn etc... I mean, if the length of episode is 100, then I will call train function 100 times after the episode is end. Or maybe you should decrease the batch_size for visual input training.

Because my computer hardware is not very good, so the optimization of image input is not very good. And I didn't pay much attention to that part, very sorry about that. I will keep optimizing those parts.

And welcome your PR.

Thx.

OK. I'm not in a hurry with the result. Ready to join the project, but it takes time to understand the code.

My problem is clearly out of memory:

no op step 10000
WARNING: tensorflow: Layer actor_net is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call tf.keras.backend.set_floatx ('float64'). To change just this layer, pass dtype = 'float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast = False to the base Layer constructor.

Process finished with exit code -1073740791 (0xC0000409)

maybe 1w experience is too large for you memory.... don't worry, I will optimize those issues later.

Ok! Thanks.
One more question.
Does the current code allow for a Exploration-exploitation trade-off? (I have not found)
If not, is it planned?
This is the first thing I could do ...

Yes, for a lot of algorithms you could adjust epsilon. For sac, you could change the target entropy or alpha's initial value, and log_std_bound.

For all of these algorithms which implement based on tf2.0, you could change the layer from Dense to Noisy in tf2nn.py to implement noisy net for more exploration.