gaia-app/gaia

๐Ÿ› : Status PLAN_FAILED

ramazulay opened this issue ยท 10 comments

Describe the bug
Gaia installed on k8s in EKS by helm(https://github.com/gaia-app/chart):
gaia-6bccfbc9f5-tz4xh 1/1 Running 0 4d20h
gaia-mongo-57d4548858-lr7ch 1/1 Running 0 4d20h
gaia-runner-869cb5f4b5-hf7fz 1/1 Running 0 4d20h

Versions
helm chart:
version: 0.1.0
appVersion: "2.3.0"
pods:
gaiaapp/gaia: "latest"
gaiaapp/runner: "v2.3.0"
mongo: latest

I got the output from job:
[gaia] using image hashicorp/terraform:latest
[gaia] installing curl
[gaia] cloning http://sample/TEST/test
Cloning into 'module'...
[gaia] generating backend configuration
[gaia] generating tfvars variable file

then Status PLAN_FAILED

In addition I got an error log from the gaia-runner pod:

2022-05-01 08:36:55.597 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps
2022-05-01 08:36:55.700 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Step found bca9253a-1f4e-4836-beba-2753e47986db. Running.
2022-05-01 08:36:55.814 INFO 1 --- [ gaia-runner-1] io.gaia_app.runner.StepRunner : Starting step bca9253a-1f4e-4836-beba-2753e47986db execution.
2022-05-01 08:36:55.815 INFO 1 --- [ gaia-runner-1] K8SExecutor : Creating pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db
2022-05-01 08:36:55.912 INFO 1 --- [ gaia-runner-1] K8SExecutor : Wait for the pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db to be running
2022-05-01 08:36:58.920 INFO 1 --- [ gaia-runner-1] K8SExecutor : Executing script in pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db
2022-05-01 08:36:59.117 INFO 1 --- [ gaia-runner-1] K8SExecutor : Getting pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db logs
2022-05-01 08:37:00.700 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps
2022-05-01 08:37:00.732 INFO 1 --- [ gaia-runner-1] K8SExecutor : Wait for the pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db to be completed
2022-05-01 08:37:00.798 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : No steps to run
2022-05-01 08:37:01.745 INFO 1 --- [ gaia-runner-1] K8SExecutor : Getting exit code of pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db
2022-05-01 08:37:01.756 INFO 1 --- [ gaia-runner-1] K8SExecutor : Pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db exited with status code 7
2022-05-01 08:37:01.756 INFO 1 --- [ gaia-runner-1] K8SExecutor : Deleting the pod gaia-job-bca9253a-1f4e-4836-beba-2753e47986db
2022-05-01 08:37:01.773 INFO 1 --- [ gaia-runner-1] io.gaia_app.runner.StepRunner : Finished step bca9253a-1f4e-4836-beba-2753e47986db execution with result code 7.
2022-05-01 08:37:01.773 INFO 1 --- [ gaia-runner-1] io.gaia_app.runner.StepRunner : Sending result.
2022-05-01 08:37:05.798 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps

2022-05-01 08:38:59.017 ERROR 1 --- [/10.100.0.1/...] i.k.client.util.WebSocketStreamHandler : Error on flush

java.io.IOException: Socket is closed!
at io.kubernetes.client.util.WebSocketStreamHandler$WebSocketOutputStream.flush(WebSocketStreamHandler.java:221) ~[client-java-14.0.0.jar!/:na]
at io.kubernetes.client.util.WebSocketStreamHandler.close(WebSocketStreamHandler.java:136) ~[client-java-14.0.0.jar!/:na]
at io.kubernetes.client.util.WebSockets$Listener.onFailure(WebSockets.java:158) ~[client-java-14.0.0.jar!/:na]
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:570) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.ws.RealWebSocket.writePingFrame(RealWebSocket.java:545) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.ws.RealWebSocket$PingRunnable.run(RealWebSocket.java:529) ~[okhttp-3.14.9.jar!/:na]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

2022-05-01 08:39:03.214 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps
2022-05-01 08:39:03.317 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : No steps to run

What should I do?
Where's my mistake?

Screenshots
image

Thanks.

I'm having a very similar problem and I'm running locally.

[gaia] using image hashicorp/terraform:1.0.8
[gaia] installing curl
[gaia] cloning https://github.com/jbrardport/terraform-modules.git
Cloning into 'module'...
[gaia] generating backend configuration
[gaia] generating tfvars variable file

And then it shows plan failed.
This is my first time using this tool so I'm not sure what went wrong.

juwit commented

I @ramazulay

There's an issue in the chart, similar that the one @jbrardport encountered.
I've just pushed a new version of the chart that fixes this issue.

Hi @juwit,
Thanks for your replay.
It works, but there is a new issue with state file:
"""
[gaia] using image hashicorp/terraform:latest
[gaia] installing curl
[gaia] cloning http://test-source-code/test
Cloning into 'module'...
[gaia] generating backend configuration
[gaia] generating tfvars variable file
[gaia] running terraform init
Terraform v1.2.0
on linux_amd64

Initializing the backend...

Successfully configured the backend "http"! Terraform will automatically
use this backend unless the backend configuration changes.
Error refreshing state: 2 problems:

  • Unsupported state file format: The state file could not be parsed as JSON: syntax error at byte offset 1.
  • Unsupported state file format: The state file does not have a "version" attribute, which is required to identify the format
    """

The state file should saved in Mongo? or PVC? something else?
Thanks.

Hi @juwit I think I am having the same issue or similar.

I'm trying to get this up and running locally using docker desktop WSL2 (Ubuntu 20.04).

Versions:
gaiaapp/gaia:latest
gaiaapp/runner:latest
mongo: latest

docker-compose.yml

version: "3.9"
services:
  gaia:
    image: "gaiaapp/gaia:latest"
    ports:
      - "8080:8080"
    environment:
      - "GAIA_MONGODB_URI=mongodb://mongo/gaia"
      - "GAIA_RUNNER_API_PASSWORD=123456"
      - "GAIA_EXTERNAL_URL=http://localhost:8080"
  runner:
    image: "gaiaapp/runner:latest"
    environment:
      - "GAIA_URL=http://gaia:8080"
      - "GAIA_RUNNER_API_PASSWORD=123456"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
  mongo:
    image: "mongo:latest"

When I run a docker-compose up -d everything appears to work as expected.

image

docker container ls

image

When I navigate to localhost:8080, I can log in as administrator and perform the following steps.

  • Create some credentials for my azure subscription. (I have tested the service principal separately to this to ensure the issue isn't with the credentials)

image

image

  • Create a module using https://github.com/declan-whiting/tf.git. The module doesn't require any variables, other than the credentials which are passed in as environment variables. Again, I have tested this module separately to Gaia to ensure it is valid

image

image

  • Create the stack using the module and credentials created in previous steps.

image

  • Run the stack.

image

image

As you can see I hit a similar looking issue to that described above.

For completion here is my settings page.

image

Logs attached.

runner-logs.txt
gaia-logs.txt
mongo-logs.txt

Apologies if there is too much detail or I have missed something important, I've only had limited time using Gaia (and as you can see I am not doing well at the first hurdle)

Updating the external url to http://host.docker.internal:8080 has resolved my issue.

image

I'm wondering if it is worth adding a note to the quick start guide for this?

juwit commented

Hi,

The docker-compose in the quickstart guide in the docs is correct.
I have some ideas to be able to remove this parameter, so there will be no confusion for users. I'll keep this issue open until I've implemented something.

I @ramazulay

There's an issue in the chart, similar that the one @jbrardport encountered. I've just pushed a new version of the chart that fixes this issue.

Hi @juwit , May i know the version name or link of version.

juwit commented

I @ramazulay
There's an issue in the chart, similar that the one @jbrardport encountered. I've just pushed a new version of the chart that fixes this issue.

Hi @juwit , May i know the version name or link of version.

Hi,
The helm chart for gaia is located here : https://github.com/gaia-app/chart
The latest version is 0.1.2

Hi @juwit , Thanks for your response.

As i am using the latest version of helm chart only, Pasting the screenshot for your reference.

image

But still i am getting the similar error while building the code, may i know what could be alternative to fix this issue.

image

I have a similar problem running with docker-compose in Ubuntu 22.04.

I notice that the problem is in the terraform container deployed by the runner: It requests a .tfvars file to URL:
{{externalUrl}}/api/runner/stacks/{{stackId}}.tfvars
and the docker-compose.yaml of quickStart points externalUrl to "http://gaia:8080".

However, runner deploys the terraform container in the docker default network and NOT in the docker network generated by docker-compose, so it does not resolve ''gaia" hostname.

I think the best solution is to make the runner to deploy the terraform container in the same docker network as the gaia container to be able to resolve gaia hostname to the gaia container IP.

I would provide a PR but I have not experience with Java nor the gaia-app code.

For now I make it work setting the external_url to the IP of my host.