kubeflow/pytorch-operator

can I use PyTorchJobClient inside a pod of the cluster?

Opened this issue · 1 comments

I get 403, if I can use this way, how should I setup the config file?

Thanks

ptc.get(namespace='kubeflow')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/kubeflow/pytorchjob/api/py_torch_job_client.py", line 134, in get
pytorchjob = thread.get(constants.APISERVER_TIMEOUT)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET
query_params=query_params)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Fri, 09 Apr 2021 15:24:19 GMT', 'Content-Length': '350'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pytorchjobs.kubeflow.org is forbidden: User "system:serviceaccount:metis:default" cannot list resource "pytorchjobs" in API group "kubeflow.org" in the namespace "kubeflow"","reason":"Forbidden","details":{"group":"kubeflow.org","kind":"pytorchjobs"},"code":403}

You can use PyTorchJobClient in a Pod.

But a proper ClusterRoleBinding should be configured for the ServiceAccounts at first.

For example, you can apply the pytorchjobs_access_rbac.yaml below to get all the access to the PytorchJob resources in a pod behind the default ServiceAccount of the default namespace.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pytorchjobs-runner-role
rules:
- apiGroups: ["kubeflow.org"]
  resources: ["pytorchjobs"]
  verbs: ["*"]    # get all the access to PytorchJob resources

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pytorchjobs-runner-role-bind
subjects:
- kind: ServiceAccount   # default service account can use
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: automl-role
  apiGroup: rbac.authorization.k8s.io
kubectl apply -f pytorchjobs_access_rbac.yaml

Best Regards.