kerberos-io/agent

[Docker compose] Docker image does not connect to Kerberos Hub with message: Something went wrong while verifying Kerberos Hub

lubikx opened this issue · 27 comments

lubikx commented

I was excited how simple it was to get the agent up and running and went ahead and bought a plan of hub. But it just does not connect.

This is the exact copy paste of error message from UI:

Something went wrong while verifying Kerberos Hub :Something went wrong while reaching to the Kerberos Hub API: https://api.cloud.kerberos.io

In log there is a different message:

HandleHeartBeat: (400) Something went wrong while sending to Kerberos Hub.

I have found this log message in the code here:

https://github.com/kerberos-io/agent/blob/02f3e6a1e22dce2092c77ef50ef576ece1b19899/machinery/src/cloud/Cloud.go#LL350C21-L350C95

I have tried wget from inside of the container and it worked so it should not be a SSL problem (wget https://api.cloud.kerberos.io).

Support link in Kerberos Hub leads only to Google where it is impossible to find anything because of the product name choice.

Anyone tried the latest docker image and successfully connected to Kerberos Hub?

I really like the architecture of the enterprise solution but this gives be doubts if the solution will be viable.

hey @lubikx, yes this should work out of the box! What happens if you press the "verify connection" in the UI?

lubikx commented

@cedricve the behaviour described is what happens when I press the Verify connection button. I have tried setting env variables in docker-compose.yml, I have tried setting it by hand in the UI, still the same.

lubikx commented

And just to make it clear, permissions on volumes are set properly, recording works (saves mp4 in recordings volume). Agent works as it should it just does not connect to Hub (which is the main point for us :)).

lubikx commented

More investigation. I can manually send a heartbeat using curl and the same credentials that are in the config file with dummy camera information:

$ curl -v https://api.cloud.kerberos.io/devices/heartbeat -H 'Content-Type: application/json' -H 'Accept: application/json'  -d @test
*   Trying 34.78.207.35:443...
* Connected to api.cloud.kerberos.io (34.78.207.35) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=api.cloud.kerberos.io
*  start date: Apr 30 05:52:43 2023 GMT
*  expire date: Jul 29 05:52:42 2023 GMT
*  subjectAltName: host "api.cloud.kerberos.io" matched cert's "api.cloud.kerberos.io"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /devices/heartbeat]
* h2h3 [:scheme: https]
* h2h3 [:authority: api.cloud.kerberos.io]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [content-type: application/json]
* h2h3 [accept: application/json]
* h2h3 [content-length: 743]
* Using Stream ID: 1 (easy handle 0x7f3d8617ba90)
> POST /devices/heartbeat HTTP/2
> Host: api.cloud.kerberos.io
> user-agent: curl/8.0.1
> content-type: application/json
> accept: application/json
> content-length: 743
> 
* We are completely uploaded and fine
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 200 
< date: Sun, 28 May 2023 10:29:27 GMT
< content-type: application/json; charset=utf-8
< content-length: 68
< strict-transport-security: max-age=15724800; includeSubDomains
< 
* Connection #0 to host api.cloud.kerberos.io left intact
{"data":"Heartbeat successfully received, updated existing device."}```
lubikx commented

This works from the command line as well (found that this is what is actually called in Cloud.go, func VerifyHub:

curl -v https://api.cloud.kerberos.io/queue/test -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'X-Kerberos-Cloud-Key: MY_PUBLIC_KEY_FROM_KERBEROS_HUB'  -d '{"message": "fake-message"}'
*   Trying 34.78.207.35:443...
* Connected to api.cloud.kerberos.io (34.78.207.35) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=api.cloud.kerberos.io
*  start date: Apr 30 05:52:43 2023 GMT
*  expire date: Jul 29 05:52:42 2023 GMT
*  subjectAltName: host "api.cloud.kerberos.io" matched cert's "api.cloud.kerberos.io"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* h2h3 [:method: POST]
* h2h3 [:path: /queue/test]
* h2h3 [:scheme: https]
* h2h3 [:authority: api.cloud.kerberos.io]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [content-type: application/json]
* h2h3 [accept: application/json]
* h2h3 [x-kerberos-cloud-key: MY_PUBLIC_KEY_FROM_KERBEROS_HUB]
* h2h3 [content-length: 27]
* Using Stream ID: 1 (easy handle 0x7fc520484a90)
> POST /queue/test HTTP/2
> Host: api.cloud.kerberos.io
> user-agent: curl/8.0.1
> content-type: application/json
> accept: application/json
> x-kerberos-cloud-key: MY_PUBLIC_KEY_FROM_KERBEROS_HUB
> content-length: 27
> 
* We are completely uploaded and fine
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 200 
< date: Sun, 28 May 2023 10:43:17 GMT
< content-type: application/json; charset=utf-8
< content-length: 32
< strict-transport-security: max-age=15724800; includeSubDomains
< 
* Connection #0 to host api.cloud.kerberos.io left intact
{"data":"Message send to queue"}```
lubikx commented

So it is here that returns the 400 to the UI but unfortunately there is no logging of the error that happened:

c.JSON(400, models.APIResponse{

lubikx commented

I have cloned the git repo and ran machinery standalone with the same config.json and the heartbeats work correctly there.
So I think the main problem is that the docker image is probably not the latest build. Can you look into that @cedricve, please?

Thanks for all your investigations, will verify that. Which docker image tag are you currently using?

lubikx commented

I have used kerberos/agent:latest and was trying it yesterday so it should be this one which was pushed 12 days ago: kerberos/agent:02f3e6a

I've just pulled latest, and cannot replicate, so it might be related to your settings/ or your hub account. I will look into your account @lubikx

@lubikx Can you share your username or email which you used to register? Send it to support@kerberos.io, and reference this GitHub issue.

lubikx commented

@cedricve I've rebuilt the docker image from source and it was the same. After adding the actual error message to the output, I've finally got something. This is the full message:

Something went wrong while verifying Kerberos Hub :Something went wrong while reaching to the Kerberos Hub API: Post "https://api.cloud.kerberos.io/queue/test": dial tcp: lookup api.cloud.kerberos.io: device or resource busy URI: https://api.cloud.kerberos.io

I really don't understand why this should happen, everything else from the container works, I can connect everywhere. It has to be something with go I suppose.

Thanks again for the detailed logging. It looks like indeed you're having issues to connect from the Kerberos Agent (in Docker container) to the Kerberos Hub (dial tcp: lookup api.cloud.kerberos.io: device or resource busy URI: https://api.cloud.kerberos.io/).

I did verified this last night (after doing a docker pull), and sadly I cannot replicate this. Summarising your previous statements, it looks like something is different in the Docker runtime on your behalf, or something related to Docker networking.

I'm currently researching the generic error, and will comeback on this!
dial tcp: lookup xxx: device or resource busy URI: https://xxx

@lubikx not sure if related. https://forums.docker.com/t/unable-to-login-to-docker-hub-cli/121762
Can you share your docker version as well?

lubikx commented

This is just so weird.

I have cloned the current master, created a small GO test case and built local docker image. In the container afterclicking the Verify connection results in the error I've sent in previous message. But running the test program (compiled with all the same settings as main) works.

package main

import "io/ioutil"
import "bytes"
import "fmt"
import "net/http"

func main() {
  content := []byte(`{"message": "fake-message"}`)
  body := bytes.NewReader(content)
  req, err := http.NewRequest("POST", "https://api.cloud.kerberos.io/queue/test", body)
  req.Header.Set("X-Kerberos-Cloud-Key", "[PUBLIC_KEY]")
  client := &http.Client{}
  resp, err := client.Do(req)
  if err == nil {
    body, err := ioutil.ReadAll(resp.Body)
    defer resp.Body.Close()
    if err == nil {
      fmt.Println("BODY: " + string(body))
    } else {
      fmt.Println("ERROR2: " + err.Error())
    }
  } else {
    fmt.Println("ERROR: " + err.Error())
  }
}

This is how I modified Dockerfile (test.go is the above source):

##################
# Build Machinery

RUN cd /go/src/github.com/kerberos-io/agent/machinery && \
        go mod download && \
        go build -tags timetzdata --ldflags '-s -w -extldflags "-static -latomic"' main.go && \
        go build -tags timetzdata --ldflags '-s -w -extldflags "-static -latomic"' test.go && \
        mkdir -p /agent && \
        mv main /agent && \
        mv test /agent && \
        mv version /agent && \
        mv data /agent && \
        mkdir -p /agent/data/cloud && \
        mkdir -p /agent/data/snapshots && \
        mkdir -p /agent/data/log && \
        mkdir -p /agent/data/recordings && \
        mkdir -p /agent/data/capture-test && \
        mkdir -p /agent/data/config && \
        rm -rf /go/src/gitlab.com/

And this is the output of running ./test:

docker compose exec kerberos-agent1 sh
~ $ ./test 
BODY: {"data":"Message send to queue"}

When you ssh into the docker container can you ping or curl other https endpoints?

lubikx commented

@cedricve yes, everything else works fine from inside the container (even the small test case above in go). It really seems like it's something in go resolver running in main

What I do not understand is why I (and probably others) cannot replicate this.. Do you have another machine, on which you can try to replicate?

lubikx commented

I've just ran it on completely different server and again, during verification: Something went wrong while verifying Kerberos Hub :Something went wrong while reaching to the Kerberos Hub API: https://api.cloud.kerberos.io

lubikx commented

Ha! We have progress. I've tested running the container just as a simple docker command and it worked! But while running as docker compose it ends up with the verification error.

This works:
docker run -p 8081:80 --name ka2 -d kagent

This does not:

version: "3.9"
services:
  ka1:
    image: "kagent"
    ports:
      - "8081:80"
docker compose up -d

Hmm we got this error in Kerberos Vault (kerberos-io/vault#11). The issue is different but the root cause, looks similar device busy. It's also using docker compose.

lubikx commented

@cedricve found the culprit. alpine is really not the best base image for go but if you build it with -tags netgo (-tags timetzdata,netgo in this case) it works even in docker compose, here's a diff:

@@ -32,7 +32,7 @@ RUN cat /go/src/github.com/kerberos-io/agent/machinery/version
 
 RUN cd /go/src/github.com/kerberos-io/agent/machinery && \
        go mod download && \
-       go build -tags timetzdata --ldflags '-s -w -extldflags "-static -latomic"' main.go && \
+       go build -tags timetzdata,netgo --ldflags '-s -w -extldflags "-static -latomic"' main.go && \
        mkdir -p /agent && \
        mv main /agent && \
        mv version /agent && \

Really happy to solve it in the end. I love the architecture of Kerberos and I'm looking forward to using it soon. :)

Hurraaay 🤩🤩 Alright would you like to make pull request, this is your work to be golden!

lubikx commented

Here you go, happy to help! Now come on and make the hub open source as well. :D

#102

Thank you! @lubikx. We try to open source as much, but still require our business model to keep supporting the product and future developments. Maybe one day.... ;) ;)

lubikx commented

I actually think it could open more business opportunities where people are not willing to pay 500 EUR / month but you could have some special licensing model (per camera/user etc, something like gitlab ce/ee) and it could accelerate development of the product. But of course I understand that it is the most valuable part of the business. :)

BTW I've just tested the image you've pushed to docker hub and everything works (Kerberos Hub settings are successfully verified.). Thanks for the support!

That's a great way to start a monday @lubikx, thanks so much for your dedication! Love it!