eclipse-che/che

Support running VS Code in the containers that don't include the pre-requisites

azatsarynnyy opened this issue ยท 26 comments

Depends on:

Is your task related to a problem? Please describe

Currently, Che-Code works perfectly in a UDI8-based container.
But, as a user, I should be able to run the Che-Code editor in my own container which, ideally, can be based on any image.
E.g.: maven:3.8.6-openjdk-18, registry.access.redhat.com/ubi9/ubi, etc.

For example, if I run Che-Code in a DevWorkspace created from the following Devfile:

schemaVersion: 2.1.0
metadata:
  name: che-code-on-ubi
components:
  - name: dev
    container:
      image: registry.access.redhat.com/ubi8/ubi
      memoryLimit: 7Gi
      cpuLimit: 3500m
  - name: projects
    volume:
      size: 3Gi

Che-Code starts well, as it brings the Node.js runtime copied into the dedicated volume. But there's a problem when opening a terminal: The terminal process "/sbin/nologin" failed to launch (exit code: 1).

term

Some logs from the container:

[12:27:52] ptyHost was unable to resolve shell environment Error: Unable to resolve your shell environment: Unexpected exit code from spawned shell (code 1, signal null)
    at /checode/checode-linux-libc/out/vs/server/node/server.main.js:92:2608
    at async /checode/checode-linux-libc/out/vs/server/node/server.main.js:68:23642
...
[12:28:08] ExtensionHostConnection#buildUserEnvironment resolving shell environment failed Error: Unable to resolve your shell environment: Unexpected exit code from spawned shell (code 1, signal null)
    at /checode/checode-linux-libc/out/vs/server/node/server.main.js:92:2608
    at async /checode/checode-linux-libc/out/vs/server/node/server.main.js:68:23642

Describe the solution you'd like

It looks like the only problem we should solve it's opening the VS Code terminal. But need to check if other functionality works well.

Describe alternatives you've considered

No response

Additional context

Running Che-Code in a UBI9-based container is problematic as well:

schemaVersion: 2.1.0
metadata:
  name: che-code-on-ubi
components:
  - name: dev
    container:
      image: registry.access.redhat.com/ubi9/ubi
      memoryLimit: 7Gi
      cpuLimit: 3500m
  - name: projects
    volume:
      size: 3Gi

See for more details at #21629.
Currently, there's a workaround described in https://github.com/che-incubator/che-code/tree/main/build/dockerfiles.
It also requires a more stable solution.

The workaround for the terminal issue is setting the bash as the VS Code default terminal profile.
It can be set in the settings directly or with the Terminal: Select Default Profile VS Code command, in Command Palette.

It turned out Che-Code's entrypoint failed to patch the /etc/passwd.
It has no shell and no home dir: 1003120000:x:1003120000:0:1003120000 user:/:/sbin/nologin

l0rd commented

The arbitrary user that runs the container doesn't have write privileges on /etc/passwd and /etc/groups. That explains why our entrypoint-volume.sh fails.

And in general it's a good practice deny write access to /etc/passwd so we have to live with that. We should probably deny access to /etc/passwd on UDI images too. From an editor point of view we should see if there is a env variable ($SHELL?) or parameter that changes the behavior of VS Code terminal, or otherwise hardcode the vscode settings to use /bin/bash (or /bin/sh that's more universal).

Note to self:
Running images as arbitrary user with CRI-O (so OpenShift) is easier than a couple of years ago because CRI-O automatically patches /etc/passwd when the container is created:

  • the username is retrieved from the image config User or, when not set, from the arbitrary UID
  • the home directory is retrieved from the image env variable HOME, from the image config WorkingDir and otherwise set to /tmp
  • the shell is hardcoded to /sbin/nologin

Today I faced with the error at starting a workspace:

Detected unrecoverable event FailedPostStartHook: 
Exec lifecycle hook ([/bin/sh -c nohup /checode/entrypoint-volume.sh > /checode/entrypoint-logs.txt 2>&1 &]) 
for Container "tools" in Pod "workspace835)" failed - 
error: rpc error: code = NotFound desc = container is not created or running: 
checking if PID of tools is running failed: container process not found, message: "".

Workspace stopped due to error

For the tools container I used UBI9-based image, it contains Node.js, also I provided VSCODE_NODEJS_RUNTIME_DIR env variable. But the workspace failed...

Definition for my tools container:

components:
  - name: tools
    container:
      image: quay.io/rnikitenko/ubi9/go-toolset:latest
      env:
        - name: VSCODE_NODEJS_RUNTIME_DIR
          value: /usr/bin
      endpoints:
        - exposure: public
          name: nodejs
          protocol: http
          targetPort: 3000
      memoryLimit: 2Gi
      memoryRequest: 256Mi
      mountSources: true
    attributes:
      controller.devfile.io/merge-contribution: true

My editor starts on the post-start event:

commands:
    - id: init-container-command
      apply:
        component: che-code-injector
    - id: init-che-code-command
      exec:
        component: che-code-runtime-description
        commandLine: 'nohup /checode/entrypoint-volume.sh > /checode/entrypoint-logs.txt 2>&1 &'
  events:
    preStart:
      - init-container-command
    postStart:
      - init-che-code-command

I was able to fix the problem by:

image

@l0rd
about
CMD ["tail", "-f", "/dev/null"]
(please see my previous comment)

I would like to know your opinion here.
Should it be a requirement for the image which is used for the tools container?

l0rd commented

@RomanNikitenko the requirement is that the container has to be non terminating. From the documentation:

Either the command defined by the image or by the container component within the devfile must be non-terminating, such as the case of setting command to sleep with the infinity argument.

l0rd commented

In fact the devfile in the devfile registry has:

  - container:
      (...)
      image: registry.access.redhat.com/ubi9/go-toolset:1.18.9-14
      args: ['tail', '-f', '/dev/null']
l0rd commented

Images in registry.devfile.io without nodejs (script and full output):

$ ./tests/check_nodejs.sh
PARAMS: image --> quay.io/wildfly/wildfly-centos7:26.1
PARAMS: image --> registry.access.redhat.com/ubi8/openjdk-17
PARAMS: image --> quay.io/devfile/composer:2.4
PARAMS: image --> icr.io/appcafe/websphere-liberty-devfile-stack:22.0.0.1
PARAMS: image --> quay.io/eclipse/che-java11-maven:next
PARAMS: image --> registry.access.redhat.com/ubi8/openjdk-11:latest
PARAMS: image --> registry.access.redhat.com/ubi8/openjdk-11:latest

Currently VS Code has a requirement: Node.js version should be >=16.17.x and <17 . But I guess some images have 18 version.

l0rd commented

Currently VS Code has a requirement: Node.js version should be >=16.17.x and <17 . But I guess some images have 18 version.

The conclusion is that solving this issue is critical for #20251 and there is no workaround.

About problems with terminal.

I've added some logic that detects:
if
os.userInfo().shell returns something like: /sbin/nologin && default terminal profile is not configured
then
a warning with 2 options is displayed for user when a workspace has started.

The first option is: Open Settings - settings for the corresponding section is open and user can select any terminal profile from the list for any scope (User, Remote, Workspace).

The second option is: Use sh as default profile - it automatically configure sh as default profile on the Workspace level, so it adds the following to the projectName/.vscode/settings.json file

"terminal.integrated.defaultProfile.linux": "sh"

As result user has a modified file in his project. Maybe it makes sense to do it for User settings (it's possible, no problem with it AFAIK) to avoid git changes directly in the project.

This repo/branch is configured to use editor with my changes for testing: https://github.com/RomanNikitenko/web-nodejs-sample/tree/test-ubi8-nodejs-18.

Use_sh_as_default.mp4
open_settings.mp4

@l0rd
I've described the current state in the comment above.
As we discussed on the meeting - I'm switching back to the main issue related to starting VS Code.

About the terminal problem:

  • if current state in general is good - I can improve it within current issue based on feedback
  • if more complex solution is required - then let's do it within a separate issue
Image Workspace starts Terminal starts Known problems
quay.io/devfile/base-developer-image:ubi9-latest + + yq: command not found
registry.access.redhat.com/ubi8/openjdk-11:latest + +
profile should be set
yq not found
Git not found
registry.access.redhat.com/ubi8/openjdk-17 + +
profile should be set
yq not found
Git not found
quay.io/eclipse/che-java11-maven:next + + yq not found
entrypoint-volume.sh: 144: [[: not found
quay.io/devfile/composer:2.4 + +
profile should be set
yq not found
entrypoint-volume.sh: sh: : unknown operand)
quay.io/wildfly/wildfly-centos7:26.1 + - yq: command not found
It looks like some extensions are not activated besause of problems in node_modules, like: /lib64/libstdc++.so.6: version GLIBCXX_3.4.20 not found

Note:

  • errors like [[: not found or sh: : unknown operand are related to the using in the entrypoint a Bash specific construct, but there is another shell in the image. For example, for the composer:2.4 it's: /bin/busybox (to check the current shell: readlink -f $(which sh))
  • we have musl based dockerfile for the Che-Code, but we don't have such dockerfile for the Devspaces-Code. It means that it's possible to run Che-Code in the alpine based tools container, but it's impossible for the Devspaces-Code.
  • for some images there is a problem related to starting a terminal. For such images terminal starts well when default terminal profile is selected, see #21778 (comment). Another option is: env variable SHELL should be set in the container or provided in the devfile.

I'm working on building Node.js from sources to get independent binary file that doesn't rely on the current system libraries.
Usually ldd node command shows a list of libs with links to the system folder:

image

After building Node.js from sources I have:

image

It allows to avoid errors like:

./node: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory 

so, a workspace starts successfully.

Has anyone tested whether using an Execution Environment container image for Ansible Platform works well? The idea is to use a built EE for the ansible developers at a customer so they start from the right place with the right ansible-core, etc.

l0rd commented

@ansiblejunky this is a good question. An ansible example, based on the EE image, has been added in the latest version of OpenShift Dev Spaces. You can give it a try at https://workspaces.openshift.com (please open a separate issue if there something that doesn't work properly using that example).

I only see Ansible sample and this spins up a DevSpace using the following image:

quay.io/devspaces/ansible-creator-ee@sha256:bae361e92ee61c95c33b98218998f10e7c69949ccf0501d16d9751d8debf66f8

I'm accessing Dev Spaces using the Developer Sandbox, so maybe that one is not updated to the latest DevSpaces yet?

Ok, I forked the repo https://github.com/kyetter/ansible-devspace-demo and then I modified the image property in the devfile.yaml to reference my EE minimal that I built and store in Quay:
https://github.com/ansiblejunky/ansible-devspaces-demo/blob/f9899e175e71664f81d084d810aad496bbfd6d53/devfile.yaml#L8C56-L8C56

And when I use this repo, I get these errors when trying to spin up the workspace:

Pworkspace843e055e9adb47db-856fbf68df-ggw4n
Generated from kubelet on ip-10-0-237-6.us-east-2.compute.internal
12:44:24
PostStartHook failed

Pworkspace843e055e9adb47db-856fbf68df-ggw4n
Generated from kubelet on ip-10-0-237-6.us-east-2.compute.internal
12:44:24
FailedPostStartHook

Pworkspace843e055e9adb47db-856fbf68df-ggw4n
Generated from kubelet on ip-10-0-237-6.us-east-2.compute.internal
12:44:24
MountVolume.SetUp failed for volume "che-gateway" : configmap "workspace843e055e9adb47db-route" not found

Pworkspace843e055e9adb47db-856fbf68df-ggw4n
Generated from kubelet on ip-10-0-237-6.us-east-2.compute.internal
12:44:23
Pulling image "quay.io/jwadleig/ansible-ee-minimal:latest"
l0rd commented

@ansiblejunky I have created a separate issue #22369.

There are two types of errors that should be fixed.
The first one:

./node: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory 

The cause of the problem: Node.js relies on shared libraries and at least one of them is absent in the container where VS Code is going to be started.
I tried to build Node.js from sources to get independent binary file that doesn't rely on the current system libraries.
I tested couple of containers and such problems were fixed when I used that independent binary file.

The second type of errors:

image

It's related to shared libraries as well, but this time it comes from the node_modules.
To fix it I tried to build VS Code using Node.js that I've built from sources, but got Segmentation fault error

To summarize:

  • we run Node.js in the user's container
  • we run VS Code in the user's container
  • both of them rely/can rely on some shared libraries
  • we have an error if a shared lib is absent in the user's container
  • workspace can not start at all if the absent lib is required for Node.js
  • workspace starts successfully, but there are errors that come from the node_modules if the absent lib is required for a VS Code dependency
l0rd commented

We just had a discussion with @RomanNikitenko and decided that a pragmatic solution is to build a new versions of che-code based on ubi9 nodejs. We already build a libc version using registry.access.redhat.com/ubi8/nodejs-18 and a musl version using docker.io/node:18.16.1-alpine3.18 so that's about:

  • adding a libc-v2.3 version using registry.access.redhat.com/ubi9/nodejs-18
  • adding the logic to figure out which libc version of vscode to copy at workspace startup

I have a draft for the solution that is described in the comment above.

I tested it for few ubi9-based images:

  • registry.access.redhat.com/ubi9/python-39:1-161
  • registry.access.redhat.com/ubi9/go-toolset:1.18.10-4
  • registry.access.redhat.com/ubi9/nodejs-18:1-84

It works well for those images.
But I detected that it doesn't work for the registry.access.redhat.com/ubi9 image.
The problem is: Node JS 18 depends on the brotli lib, but the vanilla ubi9 doesn't contains this lib.
So, it doesn't work for any image that doesn't contain that library.

I'm looking for a way how to resolve this problem.

I changed the way Node JS is provided and updated my draft PR.
Also I tested this solution for ubi8 and ubi9-based containers - workspaces were started for tested images. Please see list of tested images in the PR description.

Wonderful work, thank you @RomanNikitenko

@ibuziuk
I've merged the PR with ubi9-based images support: che-incubator/che-code#324

Can we close the current issue
or
do you prefer to keep it open?

I think it makes sense to close it and open separate issues for particular images that are not supported atm