- Kubernetes == 1.21, 不然会失败
- 因为kubeflow整个项目中声明了6个pvc,如下图所示:
- 所以在正式安装前,先准备好所需要的pv,建议多准备几个,pv.yaml见
cd $GOROOT/bin
sudo curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | sudo bash
- 在安装过程的每一步骤中,有可能出错,报错的话安装两遍,错误原因可能是因为上一步的还没启动。
Install individual components(逐一安装) In this section, we will install each Kubeflow official component (under apps) and each common service (under common) separately, using just kubectl and kustomize.
- --validate=false 很关键
kustomize build common/cert-manager/cert-manager/base | kubectl apply --validate=false -f -
kustomize build common/cert-manager/kubeflow-issuer/base | kubectl apply -f -
Istio is used by many Kubeflow components to secure their traffic, enforce network authorization and implement routing policies.
Install Istio: 报错的话安装两遍,错误原因可能是因为上一步的还没启动
kustomize build common/istio-1-11/istio-crds/base | kubectl apply -f -
kustomize build common/istio-1-11/istio-namespace/base | kubectl apply -f -
kustomize build common/istio-1-11/istio-install/base | kubectl apply -f -
Dex is an OpenID Connect Identity (OIDC) with multiple authentication backends. In this default installation, it includes a static user with email user@example.com
. By default, the user's password is 12341234
. For any production Kubeflow deployment, you should change the default password by following the relevant section.
Install Dex:
kustomize build common/dex/overlays/istio | kubectl apply -f -
The OIDC AuthService extends your Istio Ingress-Gateway capabilities, to be able to function as an OIDC client:
安装这个时,修改了原文档common/oidc-authservice/basestatefulset.yaml
,新增了下面的信息,不然会无权限访问的错误
spec:
initContainers:
- name: fix-permission
image: busybox
command: [ 'sh', '-c' ]
args: [ 'chmod -R 777 /var/lib/authservice;' ]
volumeMounts:
- mountPath: /var/lib/authservice
name: data
containers:
- name: authservice
kustomize build common/oidc-authservice/base | kubectl apply -f -
Knative is used by the KFServing official Kubeflow component.
Install Knative Serving:
kustomize build common/knative/knative-serving/overlays/gateways | kubectl apply -f -
kustomize build common/istio-1-11/cluster-local-gateway/base | kubectl apply -f -
Optionally, you can install Knative Eventing which can be used for inference request logging:
kustomize build common/knative/knative-eventing/base | kubectl apply -f -
Create the namespace where the Kubeflow components will live in. This namespace
is named kubeflow
.
Install kubeflow namespace:
kustomize build common/kubeflow-namespace/base | kubectl apply -f -
Create the Kubeflow ClusterRoles, kubeflow-view
, kubeflow-edit
and
kubeflow-admin
. Kubeflow components aggregate permissions to these
ClusterRoles.
Install kubeflow roles:
kustomize build common/kubeflow-roles/base | kubectl apply -f -
Create the Istio resources needed by Kubeflow. This kustomization currently
creates an Istio Gateway named kubeflow-gateway
, in namespace kubeflow
.
If you want to install with your own Istio, then you need this kustomization as
well.
Install istio resources:
kustomize build common/istio-1-11/kubeflow-istio-resources/base | kubectl apply -f -
Install the Multi-User Kubeflow Pipelines official Kubeflow component:
- minio-5b65df66c9-q457x 可能报错CreateContainerConfigError
kustomize build apps/pipeline/upstream/env/cert-manager/platform-agnostic-multi-user | kubectl apply -f -
KFServing was rebranded to KServe.
Install the KServe component:
kustomize build contrib/kserve/kserve | kubectl apply -f -
Install the Models web app:
kustomize build contrib/kserve/models-web-app/overlays/kubeflow | kubectl apply -f -
Install the Katib official Kubeflow component:
kustomize build apps/katib/upstream/installs/katib-with-kubeflow | kubectl apply -f -
Install the Central Dashboard official Kubeflow component:
kustomize build apps/centraldashboard/upstream/overlays/kserve | kubectl apply -f -
Install the Admission Webhook for PodDefaults:
kustomize build apps/admission-webhook/upstream/overlays/cert-manager | kubectl apply -f -
Install the Notebook Controller official Kubeflow component:
kustomize build apps/jupyter/notebook-controller/upstream/overlays/kubeflow | kubectl apply -f -
Install the Jupyter Web App official Kubeflow component:
kustomize build apps/jupyter/jupyter-web-app/upstream/overlays/istio | kubectl apply -f -
Install the Profile Controller and the Kubeflow Access-Management (KFAM) official Kubeflow components:
kustomize build apps/profiles/upstream/overlays/kubeflow | kubectl apply -f -
Install the Volumes Web App official Kubeflow component:
kustomize build apps/volumes-web-app/upstream/overlays/istio | kubectl apply -f -
Install the Tensorboards Web App official Kubeflow component:
kustomize build apps/tensorboard/tensorboards-web-app/upstream/overlays/istio | kubectl apply -f -
Install the Tensorboard Controller official Kubeflow component:
kustomize build apps/tensorboard/tensorboard-controller/upstream/overlays/kubeflow | kubectl apply -f -
Install the Training Operator official Kubeflow component:
kustomize build apps/training-operator/upstream/overlays/kubeflow | kubectl apply -f -
Finally, create a new namespace for the the default user (named kubeflow-user-example-com
).
kustomize build common/user-namespace/base | kubectl apply -f -
After installation, it will take some time for all Pods to become ready. Make sure all Pods are ready before trying to connect, otherwise you might get unexpected errors. To check that all Kubeflow-related Pods are ready, use the following commands:
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
成功后,通过 kubectl get svc -n istio-system
查看可以看到暴露的端口
浏览器输入服务器的ip:32261
即可看到登录页面,默认登录账号密码 user@example.com
/ 12341234
- 安装结束后,在20步骤的时候可能会看到许多pod的状态不是 running,需要定位修复
-
通过命令
kubectl describe pod xxxpodname -n namespace
或kubectl logs xxxpodname -n namespace
查看具体的错误信息。 -
通过新建pv,可以解决这类问题,新建pv时要保证存储和pvc的存储大小一致
然后执行命令 kubectl apply -f pv.yaml
创建持久卷,创建好后pvc会自动绑定,然后在 kubectl get pod --all-namespaces
查看pod的状态,
Pending都消除了,没消除的执行 kubectl delete pod xxxpodname -n namespace
删除pod,会自动重建。
主要是由于jupyter-web-app的安全验证策略导致的,详细见kubeflow/kubeflow#5803 解决方案环境变量加上APP_SECURE_COOKIES=false,修改见下:
即在params.env和 deployment.yaml中加上变量APP_SECURE_COOKIES=false, 然后重新执行命令
kustomize build apps/jupyter/jupyter-web-app/upstream/overlays/istio | kubectl apply -f -
kubectl delete pod jupyter-web-app-deployment-75b9fcb878-5kgzx -n kubeflow
# 删除后会自动重建,等重建完就好了
Multi-User mode Note, multi-user mode technical details were put in the How in-cluster authentication works section below.
Choose your use-case from one of the options below:
Access Kubeflow Pipelines from Jupyter notebook
In order to access Kubeflow Pipelines from Jupyter notebook, an additional per namespace (profile) manifest is required:
- "<YOUR_USER_PROFILE_NAMESPACE>" 比如我的是
kubeflow-user-example-com
- 参见
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
name: access-ml-pipeline
namespace: "<YOUR_USER_PROFILE_NAMESPACE>"
spec:
desc: Allow access to Kubeflow Pipelines
selector:
matchLabels:
access-ml-pipeline: "true"
volumes:
- name: volume-kf-pipeline-token
projected:
sources:
- serviceAccountToken:
path: token
expirationSeconds: 7200
audience: pipelines.kubeflow.org
volumeMounts:
- mountPath: /var/run/secrets/kubeflow/pipelines
name: volume-kf-pipeline-token
readOnly: true
env:
- name: KF_PIPELINES_SA_TOKEN_PATH
value: /var/run/secrets/kubeflow/pipelines/token
After the manifest is applied, newly created Jupyter notebook contains an additional option in the configurations section.
Note, Kubeflow kfp.Client
expects token either in KF_PIPELINES_SA_TOKEN_PATH
environment variable or
mounted to /var/run/secrets/kubeflow/pipelines/token
. Do not change these values in the manifest.
Similarly, audience
should not be modified as well. No additional setup is required to refresh tokens.
Remember the setup has to be repeated per each namespace (profile) that should have access to Kubeflow Pipelines API from within Jupyter notebook.
-
这样可以apt-get安装一些软件
-
重构镜像
# 让容器用户具有免密root权限
docker run --name xwtest -it -e GRANT_SUDO=yes --user root -p 8888:8888 -d public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.5.0
# 打包该镜像
docker commit 8e48346f8567 public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy-root:v1.5.0
# 查看
docker ps|grep jupyter-scipy-root