Error in mlflow reporting Enum Error code ...
flikka opened this issue · 2 comments
flikka commented
Seems to be something with the mlflow logging.
ioc-1901 does't have models built, and the last workflow, ioc-1901-1581661405269-zw245, the first model builder (ioc-1901-1581661405269-zw245-4084157029) has this stacktrace:
Gordo version 0.50.0
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gordo/cli/cli.py", line 150, in build
machine_out.report()
File "/usr/local/lib/python3.7/site-packages/gordo/machine/machine.py", line 137, in report
reporter.report(self)
File "/usr/local/lib/python3.7/site-packages/gordo/reporters/mlflow.py", line 441, in report
log_machine(mlflow_client, run_id, machine)
File "/usr/local/lib/python3.7/site-packages/gordo/reporters/mlflow.py", line 413, in log_machine
mlflow_client.log_batch(run_id, **get_batch_kwargs(machine))
File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 242, in log_batch
self._tracking_client.log_batch(run_id, metrics, params, tags)
File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 231, in log_batch
self.store.log_batch(run_id=run_id, metrics=metrics, params=params, tags=tags)
File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/rest_store.py", line 240, in log_batch
self._call_endpoint(LogBatch, req_body)
File "/usr/local/lib/python3.7/site-packages/azureml/mlflow/_internal/store.py", line 88, in _call_endpoint
return super(AzureMLRestStore, self)._call_endpoint(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/rest_store.py", line 32, in _call_endpoint
return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File "/usr/local/lib/python3.7/site-packages/mlflow/utils/rest_utils.py", line 137, in call_endpoint
response = verify_rest_response(response, endpoint)
File "/usr/local/lib/python3.7/site-packages/mlflow/utils/rest_utils.py", line 103, in verify_rest_response
raise RestException(json.loads(response.text))
File "/usr/local/lib/python3.7/site-packages/mlflow/exceptions.py", line 62, in __init__
super(RestException, self).__init__(message, error_code=ErrorCode.Value(error_code))
File "/usr/local/lib/python3.7/site-packages/google/protobuf/internal/enum_type_wrapper.py", line 71, in Value
self._enum_type.name, name))
ValueError: Enum ErrorCode has no value defined for name 1
ryanjdillon commented
Looking at one of the builders on this gordo, it appears that it's on the postgres reporter, failing due to a machine with the same key already being inserted.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gordo/cli/cli.py", line 150, in build
machine_out.report()
File "/usr/local/lib/python3.7/site-packages/gordo/machine/machine.py", line 137, in report
reporter.report(self)
File "/usr/local/lib/python3.7/site-packages/gordo/reporters/postgres.py", line 81, in report
raise PostgresReporterException(exc)
gordo.reporters.postgres.PostgresReporterException: duplicate key value violates unique constraint "machine_name"
DETAIL: Key (name)=(c5f17844-2913-4a96-b34a-6e05248da252-9999) already exists.
Perhaps some logic is missing to handle duplicate insertions.
epa095 commented
Turned out this was because of the (undocumented) Azure limit of max 200 metrics per call, which you fixed in #934 right @ryanjdillon ?