[bug] Errors when installing Helm chart in KubeFate
Closed this issue · 8 comments
I am using config/samples
to create the fatecluster. But the operator enters a live lock and cannot create it successfully.
It creates the cluster then removes it in the next reconciliation.
spec:
clusterSpec:
chartName: fate
chartVersion: v1.4.0-a
istio: {}
modules:
- rollsite
- clustermanager
- nodemanager
- mysql
- python
- client
mysql:
accessMode: ReadWriteOnce
database: eggroll_meta
ip: mysql
nodeSelector: {}
password: fate_dev
port: 3306
size: 1Gi
storageClass: mysql
user: fate
name: fatecluster-sample
namespace: fate-9999
nodemanager:
count: 3
list:
- accessMode: ReadWriteOnce
name: nodemanager
nodeSelector: {}
sessionProcessorsPerNode: 2
size: 1Gi
storageClass: nodemanager
subPath: nodemanager
sessionProcessorsPerNode: 4
partyId: 9999
python:
fateflowNodePort: 30109
fateflowType: NodePort
nodeSelector: {}
rollsite:
exchange:
ip: 192.168.1.1
port: 30000
nodePort: 30009
nodeSelector: {}
partyList:
- partyId: 10000
partyIp: 192.168.10.1
partyPort: 30010
type: NodePort
servingIp: 192.168.9.1
servingPort: 30209
kubefate:
name: kubefate-sample
namespace: kube-fate
status:
jobId: 6d6703c3-a521-4cf6-a930-f191babe61a9
clusterId: d57d0daa-5479-4638-ae45-7b6309d2efaf
status: Creating
After a while, the status becomes:
status:
status: Creating
/kind bug
Found the reason
2020-07-02T12:03:18.515+0800 DEBUG controllers.FateCluster request success {"body": "{\"data\":{\"uuid\":\"3b3cf865-0c97-41c3-a6be-bc1981ef24b4\",\"start_time\":\"2020-07-02T04:03:17.297Z\",\"end_time\":\"2020-07-02T04:03:17.316Z\",\"method\":\"ClusterInstall\",\"result\":\"failed to download \\\"kubefate/fate\\\" (hint: running `helm repo update` may help)\",\"cluster_id\":\"afa78a89-c27c-4182-acc0-3ddcb65beb6c\",\"creator\":\"admin\",\"sub-jobs\":null,\"status\":\"Failed\",\"time_limit\":3600000000000}}\n"}
Prefer to output the error as one event to the fatecluster CR.
/kind feature
2020-07-02T03:56:39Z ERR pkg/service/chart.go:356 > repoAdd error="looks like \"https://federatedai.github.io/KubeFATE/\" is not a valid chart repository or cannot be reached: Get \"https://federatedai.github.io/KubeFATE/index.yaml\": dial tcp: lookup federatedai.github.io on 10.0.0.10:53: server misbehaving"
Can we support offline installation for the cluster? Downloading from GitHub is impractical in many industries.
Can we support offline installation for the cluster? Downloading from GitHub is impractical in many industries.
Actually, there is a offline installation solution based on Harbor: Helm charts (what cluster? FATE? FATE-Serving? what version?) and images are stored in Harbor (https://github.com/FederatedAI/KubeFATE/blob/8ef9c24813c01b05a99abb84a03f7e85cd97beca/registry/README.md). We will refine it and add to this repo.
Fixes merge
/close
@LaynePeng: Closing this issue.
In response to this:
Fixes merge
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.