canonical/katib-rocks

`suggestion-enas` rock permission denied error

Closed this issue · 1 comments

Bug Description

When comparing suggestion-enas rock to the upstream docker image I found that the rock is reporting permission problems. These permission problems are not presented in upstream's Docker image.

To Reproduce

  1. docker run -ti "charmedkubeflow/suggestion-enas:v0.17.0-92cd6d9" -v

Environment

Docker

Relevant Log Output

2024-09-03T07:12:31.066Z [suggestion-enas] 2024-09-03 07:12:31.066701: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-09-03T07:12:31.067Z [suggestion-enas] 2024-09-03 07:12:31.066996: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-03T07:12:31.069Z [suggestion-enas] 2024-09-03 07:12:31.069109: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-09-03T07:12:31.095Z [suggestion-enas] 2024-09-03 07:12:31.095143: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
2024-09-03T07:12:31.095Z [suggestion-enas] To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-03T07:12:31.542Z [suggestion-enas] 2024-09-03 07:12:31.542531: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-09-03T07:12:31.837Z [suggestion-enas] WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
2024-09-03T07:12:31.837Z [suggestion-enas] I0000 00:00:1725347551.837823      95 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
2024-09-03T07:12:31.838Z [suggestion-enas] ENAS Suggestion Service
2024-09-03T07:12:31.838Z [suggestion-enas] Traceback (most recent call last):
2024-09-03T07:12:31.838Z [suggestion-enas]   File "/opt/katib/cmd/suggestion/nas/enas/v1beta1/main.py", line 45, in <module>
2024-09-03T07:12:31.838Z [suggestion-enas]     serve()
2024-09-03T07:12:31.838Z [suggestion-enas]   File "/opt/katib/cmd/suggestion/nas/enas/v1beta1/main.py", line 31, in serve
2024-09-03T07:12:31.838Z [suggestion-enas]     service = EnasService()
2024-09-03T07:12:31.838Z [suggestion-enas]   File "/opt/katib/pkg/suggestion/v1beta1/nas/enas/service.py", line 161, in __init__
2024-09-03T07:12:31.838Z [suggestion-enas]     os.makedirs("ctrl_cache/")
2024-09-03T07:12:31.838Z [suggestion-enas]   File "/usr/lib/python3.10/os.py", line 225, in makedirs
2024-09-03T07:12:31.838Z [suggestion-enas]     mkdir(name, mode)
2024-09-03T07:12:31.838Z [suggestion-enas] PermissionError: [Errno 13] Permission denied: 'ctrl_cache/'
2024-09-03T07:12:32.036Z [pebble] Service "suggestion-enas" stopped unexpectedly with code 1
2024-09-03T07:12:32.036Z [pebble] Service "suggestion-enas" on-failure action is "restart", waiting ~2s before restart (backoff 3)

Additional Context

The problem is that we skipped this section of Dockerfile while rewriting the rock.

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6198.

This message was autogenerated