trials did not complete error
surbhi1944 opened this issue · 5 comments
The Complete Error Message is:
==========================================================
(softlearning) surabhi@surabhi-Vostro-3559:~/Downloads/github/softlearning$ softlearning run_example_local examples.development --universe=gym --domain=HalfCheetah --task=v3 --exp-name=my-sac-experiment-1 --checkpoint-frequency=1000
/home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
If you depend on functionality not listed there, please file an issue.
WARNING: Logging before flag parsing goes to stderr.
I0615 04:36:45.514044 140073955813120 init.py:34] MuJoCo library version is: 200
2019-06-15 04:36:45,621 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs.
2019-06-15 04:36:45,731 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:51785 to respond...
2019-06-15 04:36:45,856 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:43790 to respond...
2019-06-15 04:36:45,860 INFO services.py:806 -- Starting Redis shard with 0.81 GB max memory.
2019-06-15 04:36:45,898 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-06-15_04-36-45_621187_9030/logs.
2019-06-15 04:36:45,899 INFO services.py:1442 -- Starting the Plasma object store with 1.21 GB memory using /dev/shm.
2019-06-15 04:36:46,022 INFO tune.py:65 -- Did not find checkpoint file in /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1.
2019-06-15 04:36:46,022 INFO tune.py:232 -- Starting a new experiment.
2019-06-15 04:36:46,027 INFO web_server.py:241 -- Starting Tune Server...
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 2.6/4.0 GB
2019-06-15 04:36:46,779 WARNING util.py:64 -- The start_trial
operation took 0.7297773361206055 seconds to complete, which may be a performance bottleneck.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 0/0 GPUs
Memory usage on this node: 2.7/4.0 GB
Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
- id=14eb5e74-seed=221: RUNNING
(pid=9082) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
(pid=9082) RequestsDependencyWarning)
(pid=9082)
(pid=9082) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=9082) For more information, please see:
(pid=9082) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=9082) * https://github.com/tensorflow/addons
(pid=9082) If you depend on functionality not listed there, please file an issue.
(pid=9082)
(pid=9082) Using seed 221
(pid=9082) 2019-06-15 04:36:50.109066: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=9082) 2019-06-15 04:36:50.114858: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
(pid=9082) 2019-06-15 04:36:50.115111: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562f555f1110 executing computations on platform Host. Devices:
(pid=9082) 2019-06-15 04:36:50.115134: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
(pid=9082) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9082) Instructions for updating:
(pid=9082) Colocations handled automatically by placer.
(pid=9082) WARNING: Logging before flag parsing goes to stderr.
(pid=9082) W0615 04:36:50.157908 140696037508864 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9082) Instructions for updating:
(pid=9082) Colocations handled automatically by placer.
2019-06-15 04:36:50,263 ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9082, host=surabhi-Vostro-3559)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train
self._build()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build
variant, training_environment)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant
return get_policy_from_params(policy_params, *args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params
**kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy
policy = FeedforwardGaussianPolicy(*args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init
self._Serializable__initialize(locals())
AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'
2019-06-15 04:36:50,264 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-06-15 04:36:50,266 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint.
(pid=9083) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
(pid=9083) RequestsDependencyWarning)
(pid=9083)
(pid=9083) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=9083) For more information, please see:
(pid=9083) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=9083) * https://github.com/tensorflow/addons
(pid=9083) If you depend on functionality not listed there, please file an issue.
(pid=9083)
(pid=9083) 2019-06-15 04:36:53.790710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=9083) 2019-06-15 04:36:53.794906: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
(pid=9083) 2019-06-15 04:36:53.795049: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557721f5e310 executing computations on platform Host. Devices:
(pid=9083) 2019-06-15 04:36:53.795068: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
(pid=9083) Using seed 221
(pid=9083) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9083) Instructions for updating:
(pid=9083) Colocations handled automatically by placer.
(pid=9083) WARNING: Logging before flag parsing goes to stderr.
(pid=9083) W0615 04:36:53.833971 139846760371968 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9083) Instructions for updating:
(pid=9083) Colocations handled automatically by placer.
2019-06-15 04:36:53,940 ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9083, host=surabhi-Vostro-3559)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train
self._build()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build
variant, training_environment)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant
return get_policy_from_params(policy_params, *args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params
**kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy
policy = FeedforwardGaussianPolicy(*args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init
self._Serializable__initialize(locals())
AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'
2019-06-15 04:36:53,941 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-06-15 04:36:53,943 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 0/0 GPUs
Memory usage on this node: 2.9/4.0 GB
Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
- id=14eb5e74-seed=221: RUNNING, 2 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-36-53.txt
(pid=9081) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
(pid=9081) RequestsDependencyWarning)
(pid=9081)
(pid=9081) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=9081) For more information, please see:
(pid=9081) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=9081) * https://github.com/tensorflow/addons
(pid=9081) If you depend on functionality not listed there, please file an issue.
(pid=9081)
(pid=9081) Using seed 221
(pid=9081) 2019-06-15 04:36:57.425650: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=9081) 2019-06-15 04:36:57.429647: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
(pid=9081) 2019-06-15 04:36:57.429862: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b3ac7c78a0 executing computations on platform Host. Devices:
(pid=9081) 2019-06-15 04:36:57.429886: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
(pid=9081) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9081) Instructions for updating:
(pid=9081) Colocations handled automatically by placer.
(pid=9081) WARNING: Logging before flag parsing goes to stderr.
(pid=9081) W0615 04:36:57.472656 140634258609920 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9081) Instructions for updating:
(pid=9081) Colocations handled automatically by placer.
2019-06-15 04:36:57,574 ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9081, host=surabhi-Vostro-3559)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train
self._build()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build
variant, training_environment)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant
return get_policy_from_params(policy_params, *args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params
**kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy
policy = FeedforwardGaussianPolicy(*args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init
self._Serializable__initialize(locals())
AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'
2019-06-15 04:36:57,575 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-06-15 04:36:57,576 INFO trial_runner.py:524 -- Attempting to recover trial state from last checkpoint.
(pid=9084) /home/surabhi/.local/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
(pid=9084) RequestsDependencyWarning)
(pid=9084)
(pid=9084) WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
(pid=9084) For more information, please see:
(pid=9084) * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
(pid=9084) * https://github.com/tensorflow/addons
(pid=9084) If you depend on functionality not listed there, please file an issue.
(pid=9084)
(pid=9084) Using seed 221
(pid=9084) 2019-06-15 04:37:00.981560: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
(pid=9084) 2019-06-15 04:37:00.987048: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
(pid=9084) 2019-06-15 04:37:00.987274: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x561e70f53660 executing computations on platform Host. Devices:
(pid=9084) 2019-06-15 04:37:00.987293: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
(pid=9084) WARNING:tensorflow:From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9084) Instructions for updating:
(pid=9084) Colocations handled automatically by placer.
(pid=9084) WARNING: Logging before flag parsing goes to stderr.
(pid=9084) W0615 04:37:01.021606 140204585965312 deprecation.py:323] From /home/surabhi/.local/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=9084) Instructions for updating:
(pid=9084) Colocations handled automatically by placer.
2019-06-15 04:37:01,131 ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/worker.py", line 2189, in get
raise value
ray.exceptions.RayTaskError: ray_ExperimentRunner:train() (pid=9084, host=surabhi-Vostro-3559)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 82, in _train
self._build()
File "/home/surabhi/Downloads/github/softlearning/examples/development/main.py", line 59, in _build
variant, training_environment)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 75, in get_policy_from_variant
return get_policy_from_params(policy_params, *args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 68, in get_policy_from_params
**kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/utils.py", line 10, in get_gaussian_policy
policy = FeedforwardGaussianPolicy(*args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/policies/gaussian_policy.py", line 226, in init
self._Serializable__initialize(locals())
AttributeError: 'FeedforwardGaussianPolicy' object has no attribute '_Serializable__initialize'
2019-06-15 04:37:01,132 INFO ray_trial_executor.py:187 -- Destroying actor for trial id=14eb5e74-seed=221. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 2.9/4.0 GB
Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1
Number of trials: 1 ({'ERROR': 1})
ERROR trials:
- id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 2.9/4.0 GB
Result logdir: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1
Number of trials: 1 ({'ERROR': 1})
ERROR trials:
- id=14eb5e74-seed=221: ERROR, 4 failures: /home/surabhi/ray_results/gym/HalfCheetah/v3/2019-06-15T04-36-45-my-sac-experiment-1/id=14eb5e74-seed=221_2019-06-15_04-36-46cj00ypvt/error_2019-06-15_04-37-01.txt
Traceback (most recent call last):
File "/home/surabhi/anaconda3/envs/softlearning/bin/softlearning", line 11, in
load_entry_point('softlearning', 'console_scripts', 'softlearning')()
File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 202, in main
return cli()
File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/surabhi/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/surabhi/Downloads/github/softlearning/softlearning/scripts/console_scripts.py", line 71, in run_example_local_cmd
return run_example_local(example_module_name, example_argv)
File "/home/surabhi/Downloads/github/softlearning/examples/instrument.py", line 224, in run_example_local
reuse_actors=True)
File "/home/surabhi/.local/lib/python3.6/site-packages/ray/tune/tune.py", line 272, in run raise
TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [id=14eb5e74-seed=221])
This seems like a bug on our end. Which git version you're using (i.e. the output of git rev-parse HEAD
)? Also, have you made any changed to the code or the variants?
This issue solved. I think it was because of conda related problem.
Thanks
Hi there,
How did you solve the issue? I got exactly the same error and don't know how to solve this.