Cannot add host to Project: Error 500
Opened this issue · 16 comments
Summary
After following the documentation in https://vmware.github.io/vic-product/assets/files/html/1.5, I cannot add a host to a project.
Admiral cannot communicate with the VCH instance
VCH instance logs show errors while trying to stat datastore
Environment information
vSphere 6.7
Single ESXi host 6.7
vCenter Server appliance with embedded Platform controller 6.7
VIC 1.5
VCH deployed with UI Wizard
one single datastore
a bridge network created with virtual switch
default VM Network as public network
vSphere and vCenter Server version
vSphere and vCenter 6.7 update 1
VIC Appliance version
vic-v1.5.2-7206-92ebfaf5
Configuration
- Embedded or external PSC: Embedded
- How was the OVA deployed? (Flex client, HTML5 client, ovftool): HTML5
- Does the VIC appliance recieve configuration by DHCP? YES
- What stage of the Appliance Lifecycle is the VIC appliance in? Running (I think)
- IP address of VIC appliance:
- Hostname of VIC appliance:
- IP address of vCenter Server:
- Hostname of vCenter Server:
Details
Was following the documentation step by step to deploy the first VCH host.
VCH host is deployed successfully.
vic-machine-linux ls shows the host
All green checks in VCH admin portal
Used the default-project in admiral, tried to add the VCH host to default-project
No TLS being used. Tried to add the host:
Error connecting to http://192.168.0.110:2376: Unexpected error: Connection refused: /192.168.0.110:2376
Using http since the docs say that use http with no TLS. tried several combinations, none of the works.
Changed type from VCH to DOCKER, received error 500.
Inspect logs in VCH admin portal. Several ERROR messages (but UI have all green checks)
Docker Personality log show several times
Apr 1 2019 23:10:04.393Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:cannot stat '[datastore1] virtual-container-host/VIC/423bf6bd-c91b-3c79-a0ac-ae0b26077784/images': No such file}
Port Layer showing the same:
Apr 1 2019 23:10:04.392Z ERROR op=264.404: Error getting image store 423bf6bd-c91b-3c79-a0ac-ae0b26077784: cannot stat '[datastore1] virtual-container-host/VIC/423bf6bd-c91b-3c79-a0ac-ae0b26077784/images': No such file
No problems in Init log
VIC Admin log show same error several times:
Apr 1 2019 23:10:11.204Z ERROR Process docker-engine-server not running: open /.tether/run/docker-engine-server.pid: no such file or directory
Steps to reproduce
Follow docs to deploy VIC, create VCH
Assign VCH to default-project in Admiral
Actual behavior
Cannot establish connection error
Expected behavior
VCH should be added to default-project
Support information
Logs
Not comfortable with posting publicly, private channel is ok
See also
Troubleshooting attempted
- [ x] Searched GitHub for existing issues. (Mention any similar issues under "See also", above.)
- [ x] Searched the documentation for relevant troubleshooting guidance.
- Searched for a relevant VMware KB article.
This looks like a similar https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ts_imagestore_error.html error. But in this case, the error message is different, and I did not assigned any container to run yet.
I have manually browser to the datastore1 folder, and created the images folder in there. At that point, the Docker Personality log reported success
Apr 2 2019 18:52:56.080Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:500 Internal Server Error}
time="2019-04-02T18:52:59Z" level=info msg="Launching docker personality pprof server on 127.0.0.1:6062"
Apr 2 2019 18:52:59.356Z ERROR Unable to load CAs for registry access in config
Apr 2 2019 18:52:59.356Z INFO Waiting for portlayer to come up
Apr 2 2019 18:53:01.358Z INFO Portlayer is up and responding to pings
Apr 2 2019 18:53:01.358Z INFO Refreshing repository cache
Apr 2 2019 18:53:01.360Z INFO Image cache initialized successfully
Apr 2 2019 18:53:01.360Z INFO Repository cache updated successfully
Apr 2 2019 18:53:01.360Z INFO Layer cache initialized successfully
Apr 2 2019 18:53:01.361Z INFO Container cache updated successfully
Apr 2 2019 18:53:01.361Z INFO Creating image store
Apr 2 2019 18:53:01.362Z INFO TLS enabled
Apr 2 2019 18:53:01.363Z INFO Listener created for HTTP on 192.168.0.110//tcp
Apr 2 2019 18:53:01.379Z INFO API listen on 192.168.0.110:2376
But then I restarted the host, and my images folder previously created was removed again. Apparently, that folder is managed by the VCH host, and will run into the same problem after every reboot
After this workaround, I was able to add the host to the project! :)
But I am still having storage related problems. Trying to deploy a container in fails with the following error:
Retries are prevented. Failure: Error: Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}; Reason: {"errorDetail":{"message":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"},"error":"Failed to write to image store: [POST /storage/{store_name}][500] WriteImage default \u0026{Code:500 Message:parent (scratch) doesn't exist in http:///storage/images/423bf6bd-c91b-3c79-a0ac-ae0b26077784: file does not exist}"}
Tried destroying the VCH host created, and create a new one. I run into the same issue. I cannot add the host to a project because of the same error, and after applying the workaround it allows me to do it. But then, I cannot run a container into the host because of the same error in #2413 (comment)
@dnoliver We met the similar issues before when the VC user or the opsuser you use to create VCH do not have the privilege to create the datastore folder. Is that your case?
I followed this guide to create the vic-ops user https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/create_ops_user.html
In the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/set_up_ops_user.html docs, I saw:
Grant Any Necessary Permissions
The operations user account must exist before you create a VCH. If you are deploying the VCH to a cluster, vSphere Integrated Containers Engine can configure the operations user account with all of the necessary permissions for you.
IMPORTANT: The option to grant any necessary permissions automatically only applies when deploying VCHs to clusters. If you are deploying the VCH to a standalone host that is managed by vCenter Server, you must configure the operations user account manually. For information about manually configuring the operations user account, see Manually Create a User Account for the Operations User.
I think I am doing a standalone host deployment, so maybe I have to rather change that to the Cluster deployment, or follow the https://vmware.github.io/vic-product/assets/files/html/1.5/vic_vsphere_admin/ops_user_manual.html to assign permissions to that user if I want to do the standalone host deployment.
I will try that and report results backs. Thank you @wjun!
I have definitively a datastore permissions problem for my vic-ops user :) thank you for the hint @wjun
- I am not using a cluster, deploying directly to a host. So VCH deployment do not help me with permissions
- I went trough the manual permissions docs. I cannot guarantee that I did it correctly, given that there are several clicks to be done. in several places. Custom permissions applied to Root VCenter, Datacenter, and ESXi host. Datastore have custom VCH - endpoint - datastore inherited permission
- Created the VCH host again with the Wizard, and I left the "apply permissions" checked. This caused that my vic-user permissions to be modified everywhere with the ones created by the tool, instead of the ones that I spend time assigning manually. My mistake, but somebody could add a warning there!
- Same error deploying VCH, workaround applied, but cannot create Container (again)
- Applied the same permissions that the wizard override. Cannot guarantee that I did it correctly... failed again.
- Gave Administrator permission on the datastore for the vic-ops user, I can deploy containers!
- Assigned previous permissions on the datastore for the vic-ops user, I run into problems again. This confirm my permission problems.
My VCH - endpoint - datastore
permission looks like this:
dvPort group
Modify
Policy operation
Scope operation
Datastore
Allocate space
Browse datastore
Configure datastore
Low level file operations
Remove file
Host
Configuration
System Management
Resource
Assign virtual machine to resource pool
Migrate powered off virtual machine
Virtual machine
Change Configuration
Add existing disk
Add new disk
Add or remove device
Advanced configuration
Modify device settings
Remove disk
Rename
Edit Inventory
Create new
Register
Remove
Unregister
Guest operations
Guest operation modifications
Guest operation program execution
Guest operation queries
Interaction
Connect devices
Power off
Power on
Then, the more accurate question will be: why do vic-ops user run into datastore permission problems while creating VCH and/or running containers, if it have all the permissions specified by the documentation?
I did the same deployment, but now using a cluster, and the problem is still there. I have to manually create the images
folder to make the VCH host work for the first time, and I need to add administrative role to vic-user on the datastore to make a successful container deployment. So this permission problem happens regardless of doing a cluster or standalone host deployment
I tried VCH create from CLI onto a VC cluster, and it works. Please note --user is an admin user. --ops-user must be combined with --ops-grant-perms so VCH can assign related permissions to this ops-user automatically.
Great @wjun, I have only tested this with the UI Wizard, where I think the --user administrator@vsphere.local
is implicit (that is the user I use for log in into vCenter Server). I will give a shot to the CLI command to validate. Thanks!
@wjun I have validated the API approach, and I run into the same issue again.
The command used to deploy this VCH was
./vic-machine-linux create --name virtual-container-host-1 \
--compute-resource Cluster \
--image-store 'datastore1 (1)' \
--base-image-size 8GB
--volume-store 'datastore1 (1):default' \
--bridge-network vic-bridge \
--bridge-network-range 172.16.0.0/12 \
--public-network 'VM Network' \
--tls-cname virtual-container-host-1 \
--certificate-key-size 2048 \
--no-tlsverify --user Administrator@VSPHERE.LOCAL \
--thumbprint <thumb>
--target 192.168.0.238/Datacenter
--ops-user vic-ops@vsphere.local
--ops-grant-perms
The command execution log is below:
INFO[0000] ### Installing VCH ####
INFO[0000] vSphere password for vic-ops@vsphere.local:
INFO[0003] Loaded server certificate virtual-container-host-1/server-cert.pem
WARN[0003] Configuring without TLS verify - certificate-based authentication disabled
INFO[0003] Validating supplied configuration
INFO[0004] Network configuration OK on "vic-bridge"
INFO[0004] vCenter settings check OK
INFO[0004] Firewall status: ENABLED on "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] Firewall configuration OK on hosts:
INFO[0004] "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] vCenter settings check OK
INFO[0004] License check OK on hosts:
INFO[0004] "/Datacenter/host/Cluster/192.168.0.217"
INFO[0004] DRS check OK on:
INFO[0004] "/Datacenter/host/Cluster"
WARN[0004] Only one host can access all of the image/volume datastores. This may be a point of contention/performance degradation and HA/DRS may not work as intended.
INFO[0004]
INFO[0005] Creating Resource Pool "virtual-container-host-1"
INFO[0005] Creating appliance on target
INFO[0005] Network role "client" is sharing NIC with "public"
INFO[0005] Network role "management" is sharing NIC with "public"
INFO[0005] Creating the VCH folder
INFO[0005] Creating the VCH VM
INFO[0006] Creating directory [datastore1 (1)] VIC
INFO[0006] Datastore path is [datastore1 (1)] VIC
INFO[0007] Uploading ISO images
INFO[0008] Uploading appliance.iso as V1.5.2-20879-30B67A14-appliance.iso
INFO[0027] Uploading bootstrap.iso as V1.5.2-20879-30B67A14-bootstrap.iso
INFO[0045] Waiting for IP information
INFO[0052] Waiting for major appliance components to launch
INFO[0052] Obtained IP address for client interface: "192.168.0.199"
INFO[0052] Checking VCH connectivity with vSphere target
INFO[0052] vSphere API Test: https://192.168.0.238 vSphere API target responds as expected
ERRO[0225] vic/lib/install/management.(*Dispatcher).CheckDockerAPI: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux
INFO[0225] Docker API endpoint check failed: context deadline exceeded
INFO[0225] Collecting 598fc05d-88d3-4d9b-8c5a-f55a274e2db1 vpxd.log
INFO[0225] API may be slow to start - try to connect to API after a few minutes:
INFO[0225] Run command: docker -H 192.168.0.199:2376 --tls info
INFO[0225] If command succeeds, VCH is started. If command fails, VCH failed to install - see documentation for troubleshooting.
ERRO[0225] vic/cmd/vic-machine/create.(*Create).Run.func3: Create error: context deadline exceeded
vic/cmd/vic-machine/create.(*Create).Run:755 Create
vic/cmd/vic-machine/common.NewOperation:27 vic-machine-linux
ERRO[0225] --------------------
ERRO[0225] vic-machine-linux create failed: Creating VCH exceeded time limit of 3m0s. Please increase the timeout using --timeout to accommodate for a busy vSphere target
At least this time I have an error! and not the silent error that the UI Wizard run. In the Docker personality log, I can see the same issue as before:
Apr 8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}
And I am sure the workaround still apply. If I create the /images
directory manually, and assign admin permissions in the datastore to the vic-ops users, this will start working.
I also tried to deploy a VCH keeping the Administrator access for vic-ops in the datastore, and removing the --ops-grant-perms
parameter, and it runs into the same issue as before:
Apr 8 2019 20:26:07.096Z FATAL failed to initialize backend: [POST /storage][500] CreateImageStore default &{Code:500 Message:cannot stat '[datastore1 (1)] virtual-container-host-1/VIC/423b9844-dafc-a118-d910-f6ce4a309745/images': No such file}
So this /images
folder error could be not a permissions problem (at least in the datastore), but something else. The Administrator permissions seems to solve the second issue when trying to run a container, but not the initial creation of the images folder.
@dnoliver I tried various combinations of ops-user and datastores, and cannot reproduce in my local env. Could you post your portlayer.log as well where there should be error messages related to images directory creation failure? Another option is to remove --ops-user and --ops-grant-perms during VCH create first and see if you can reproduce the issue or not.
Ok, will try to share the portlayer.log file.
The only special thing about my installation is that it is using VM Encryption. I have a KMS, and encryption storage policy, and a couple of encrypted VMs running in the same host. Is that something relevant to this issue?
I hate to comment on an old thread, but I have vSAN encryption with vCenter KMS, and experienced the same problem with the 'grant all permissions needed' option, and needing to create the images folder manually for this to work. So this still seems to be an issue.