[ERROR] docker image 1.6.0 : "pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml"
toni-moreno opened this issue · 8 comments
Helo @regel
After created a model , when running it , this error appeared in the output docker log and no data in the output db has been generated. Any idea on what to do?
loudml_1 | 172.20.0.3 - - [2020-07-28 07:13:32] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002119
loudml_1 | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:12:37, next run: 2020-07-28 07:13:37)
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 105, in wrapper
loudml_1 | return job_func(*args, **kwargs)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/server.py", line 178, in daemon_exec_scheduled_job
loudml_1 | 'loudmld {}'.format(pkg_resources.require("loudml")[0].version)
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 963, in require
loudml_1 | needed = self.resolve(parse_requirements(requirements))
loudml_1 | File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 849, in resolve
loudml_1 | raise DistributionNotFound(req, requirers)
loudml_1 | pkg_resources.DistributionNotFound: The 'pycrypto>=2.6.1' distribution was not found and is required by loudml
loudml_1 | 172.20.0.3 - - [2020-07-28 07:13:49] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002227
loudml_1 | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: 2020-07-28 07:13:04, next run: 2020-07-28 07:14:04)
loudml_1 | 172.20.0.3 - - [2020-07-28 07:14:05] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002722
loudml_1 | 172.20.0.3 - - [2020-07-28 07:14:19] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.002560
loudml_1 | 172.20.0.3 - - [2020-07-28 07:14:33] "GET /models/linux_metrics_cpu_mean_usage_system_host_myhost_time_5m HTTP/1.1" 200 880 0.001884
loudml_1 | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(linux_metrics_cpu_mean_usage_system_host_myhost_time_5m)') (last run: 2020-07-28 07:13:38, next run: 2020-07-28 07:14:38)
This is the model info.
> version
1.6.0
> list-models
linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
> show-model linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
- settings:
bucket_interval: 5m
default_bucket: myhost_linux
features:
- default: 0
field: usage_system
io: io
match_all:
- tag: host
value: myhost
measurement: cpu
metric: mean
name: mean_usage_system
grace_period: 0
interval: 60s
max_evals: 10
max_threshold: 0
min_threshold: 0
name: linux_metrics_cpu_mean_usage_system_host_myhost_time_5m
offset: 10s
run:
flag_abnormal_data: true
output_bucket: myhost_loudml
save_output_data: true
seasonality:
daytime: false
weekday: false
span: 100
type: donut
training:
job_id: fdb8d872-865d-4cdf-912a-1625a214fc54
progress:
eval: 10
max_evals: 10
state: done
> list-buckets
myhost_linux
myhost_loudml
> show-bucket myhost_loudml
- addr: X.X.X.X:8086
annotation_db: loudml_annotations
create_database: false
database: loudml_metrics
dbuser: loudml_user
measurement: loudml
name: myhost_loudml
retention_policy: autogen
type: influxdb
use_ssl: true
verify_ssl: false
Hello @regel , I've tested again in a new server with loudml:1.6.0 image and also with today loudml:nightly image, in both the error persist
As a help, I've found a 'bypass' (while no need to change image) by installing some basic python packages as root direct inside the image
$ docker exec -it -u 0 7e011d7c0881 bash
root@7e011d7c0881:/# apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y
no restart needed!!! , suddenly the error log has disappeared and loudml began to write the the output database.
right now
Oops. Very good catch. Thanks Toni. Something is odd in the build. I'm patching the Dockerfile.
Solved. Toni, see the above patches and new Dockerfile in develop
branch if you need to build a local image.
I will tag a new 1.6 release e/o the month.
Hello @regel , thanks a lot for this fix.
I've build a new image and pushed here if you want to test it. tonimoreno/loudml:1.6.0
but when restarted the service with the new image this error appeared. Can you help me to understand what I did wrong?
Attaching to loudml-poc_loudml_1
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | _np_qint8 = np.dtype([("qint8", np.int8, 1)])
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | _np_qint16 = np.dtype([("qint16", np.int16, 1)])
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | _np_qint32 = np.dtype([("qint32", np.int32, 1)])
loudml_1 | /opt/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
loudml_1 | np_resource = np.dtype([("resource", np.ubyte, 1)])
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@95percentile@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@10m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@1m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@30m'
loudml_1 | INFO:root:restarting job for model 'swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@5m'
loudml_1 | INFO:root:starting Loud ML server on 0.0.0.0:8077
loudml_1 | 192.168.48.3 - - [2020-08-05 05:17:55] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.001249
loudml_1 | 192.168.48.3 - - [2020-08-05 05:18:10] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000694
loudml_1 | 192.168.48.3 - - [2020-08-05 05:18:25] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000983
loudml_1 | 192.168.48.3 - - [2020-08-05 05:18:40] "GET /models/linux_metrics_cpu_mean_usage_system_host_telegraf_time_5m HTTP/1.1" 404 193 0.000804
loudml_1 | INFO:schedule:Running job Every 1 minute do daemon_clear_jobs() (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1 | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1 | INFO:root:job[0be19343-c409-4db5-af7f-540f4475efee] starting, nice=0
loudml_1 | INFO:root:predict(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1 | XXX lineno: 115, opcode: 0
loudml_1 | ERROR:root:unknown opcode
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1 | res = getattr(self, func_name)(*args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1 | **kwargs
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1 | num_gpus=num_gpus,
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1 | self.load(num_cpus, num_gpus)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1 | self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1 | keras_model = load_model(path, compile=False)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1 | model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1 | return deserialize(config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1 | printable_module_name='layer')
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1 | list(custom_objects.items())))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1 | process_node(layer, node_data)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1 | layer(input_tensors, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1 | outputs = self.call(inputs, *args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1 | return self.function(inputs, **arguments)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1 | z_mean, z_log_var = args
loudml_1 | SystemError: unknown opcode
loudml_1 | ERROR:root:job[0be19343-c409-4db5-af7f-540f4475efee] failed: unknown opcode
loudml_1 | [2020-08-05 05:18:54,323] ERROR in app: Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1 | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1 | return function(*args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1 | return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1 | raise exn
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1 | res = getattr(self, func_name)(*args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1 | **kwargs
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1 | num_gpus=num_gpus,
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1 | self.load(num_cpus, num_gpus)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1 | self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1 | keras_model = load_model(path, compile=False)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1 | model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1 | return deserialize(config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1 | printable_module_name='layer')
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1 | list(custom_objects.items())))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1 | process_node(layer, node_data)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1 | layer(input_tensors, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1 | outputs = self.call(inputs, *args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1 | return self.function(inputs, **arguments)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1 | z_mean, z_log_var = args
loudml_1 | SystemError: unknown opcode
loudml_1 |
loudml_1 |
loudml_1 | The above exception was the direct cause of the following exception:
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1 | response = self.full_dispatch_request()
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1 | rv = self.handle_user_exception(e)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1 | return original_handler(e)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1 | reraise(exc_type, exc_value, tb)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1 | raise value
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1 | rv = self.dispatch_request()
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1 | return self.view_functions[rule.endpoint](**req.view_args)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1 | return jsonify(job.result())
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1 | return self._future.result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1 | return self.__get_result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1 | raise self._exception
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1 | self._result = self._future.result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1 | return self.__get_result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1 | raise self._exception
loudml_1 | SystemError: unknown opcode
loudml_1 | ERROR:loudml.server:Exception on /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval [POST]
loudml_1 | pebble.common.RemoteTraceback: Traceback (most recent call last):
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/pebble/common.py", line 174, in process_execute
loudml_1 | return function(*args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 351, in run
loudml_1 | return g_worker.run(job_id, nice, func_name, *args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 58, in run
loudml_1 | raise exn
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
loudml_1 | res = getattr(self, func_name)(*args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 243, in predict
loudml_1 | **kwargs
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1594, in predict2
loudml_1 | num_gpus=num_gpus,
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1208, in predict
loudml_1 | self.load(num_cpus, num_gpus)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1147, in load
loudml_1 | self._keras_model = _load_keras_model(self._state.get('h5py'))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 247, in _load_keras_model
loudml_1 | keras_model = load_model(path, compile=False)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 234, in load_model
loudml_1 | model = model_from_config(model_config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/saving.py", line 324, in model_from_config
loudml_1 | return deserialize(config, custom_objects=custom_objects)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 74, in deserialize
loudml_1 | printable_module_name='layer')
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
loudml_1 | list(custom_objects.items())))
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1273, in from_config
loudml_1 | process_node(layer, node_data)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1233, in process_node
loudml_1 | layer(input_tensors, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
loudml_1 | outputs = self.call(inputs, *args, **kwargs)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 743, in call
loudml_1 | return self.function(inputs, **arguments)
loudml_1 | File "/opt/vendor/lib/python3.5/site-packages/loudml/donut.py", line 115, in sampling
loudml_1 | z_mean, z_log_var = args
loudml_1 | SystemError: unknown opcode
loudml_1 |
loudml_1 |
loudml_1 | The above exception was the direct cause of the following exception:
loudml_1 |
loudml_1 | Traceback (most recent call last):
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
loudml_1 | response = self.full_dispatch_request()
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
loudml_1 | rv = self.handle_user_exception(e)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask_restful/__init__.py", line 269, in error_router
loudml_1 | return original_handler(e)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
loudml_1 | reraise(exc_type, exc_value, tb)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
loudml_1 | raise value
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
loudml_1 | rv = self.dispatch_request()
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
loudml_1 | return self.view_functions[rule.endpoint](**req.view_args)
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 1602, in model_eval
loudml_1 | return jsonify(job.result())
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 393, in result
loudml_1 | return self._future.result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
loudml_1 | return self.__get_result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1 | raise self._exception
loudml_1 | File "/opt/venv/lib/python3.7/site-packages/loudml/server.py", line 372, in _done_cb
loudml_1 | self._result = self._future.result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
loudml_1 | return self.__get_result()
loudml_1 | File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
loudml_1 | raise self._exception
loudml_1 | SystemError: unknown opcode
loudml_1 | 127.0.0.1 - - [2020-08-05 05:18:54] "POST /models/swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m/_eval?output_bucket=test-loudml&flag_abnormal_data=True&save_output_data=True&from=1596604664&to=1596604724 HTTP/1.1" 500 156 0.260177
loudml_1 | ERROR:root:error executing scheduled job '_eval(swarm@cpu@10percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)':INTERNAL SERVER ERROR
loudml_1 | INFO:schedule:Running job Every 60.0 seconds do daemon_exec_scheduled_job('_eval(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m)') (last run: [never], next run: 2020-08-05 05:18:53)
loudml_1 | INFO:root:job[4cb5ec63-e3f0-475d-8075-bbdc3bb38264] starting, nice=0
loudml_1 | INFO:root:predict(swarm@cpu@90percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T05:15:00.000Z-2020-08-05T05:20:00.000Z
loudml_1 | XXX lineno: 115, opcode: 0
loudml_1 | ERROR:root:unknown opcode
Hi Toni. Interesting finding. I upgraded the Python version to 3.7. The Python serialisation format is probably different in this version causing ‘load_model’ to fail.
What if you delete model state and re-train the model? Solves the issue?
Same here!
Using @toni fix, seems to works 👍
$ docker exec -it -u 0 7e011d7c0881 bash
root@7e011d7c0881:/# apt-get update && apt-get install -y python3-pip python3-setuptools python3-dev && apt-get install -y --no-install-recommends build-essential gcc git && apt-get purge -y
Hi- I was wondering if this ever got resolved and included in the final release? If I use "FROM loudml/loudml:1.6.0 in my dockerfile I still get this error.
I had to create my own docker image like this:
Dockerfile
FROM loudml/loudml:latest
# SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER 0
# https://github.com/regel/loudml/issues/370
RUN apt-get update && \
apt-get install -y \
python3-pip python3-setuptools \
python3-dev && \
apt-get install -y --no-install-recommends \
build-essential gcc git &&\
apt-get purge -y
ENTRYPOINT ["loudmld"]
Note USER 0 is needed because for some reason base image uses uid 1001 that doesn't have permission to install deps
Then in docker-compose:
# image: loudml/loudml:1.6.0
build: .
container_name: loudml