mantl/mesos-consul

Question about consul service registration with regards to ports and docker

Opened this issue · 10 comments

i currently have an app that is running in a docker container using mesos scheduled with marathon, along with the mesos-consul bridge.

Current marathon app configuration is using bridge networking and allowing mesos/marathon to select whatever port that is available for the host port, but the docker container itself is bound to 8080:

{
"container": {
"type": "DOCKER",
"docker": {
"image": "sarlindo/wildfly-app",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 8080, "hostPort": 0, "servicePort": 0, "protocol": "tcp" }
]
}
},
"id": "wildfly",
"cmd": "/opt/jboss/wildfly/bin/standalone.sh -b 0.0.0.0 -bmanagement=0.0.0.0",
"instances": 1,
"cpus": 0.3,
"mem": 256
}

Now, when this service gets registered with consul by the mesos-consul bridge, I see it being registered to the following ip/port.

172.17.0.4:31657

Now the ip here is the internal docker ip and not the host and the port number is the host port that mesos/marathon assigned.

The issue now is I can't get to this service because inside the docker container the port is actually 8080.

Is this the way this is suppose to work? Or am I doing something wrong here?

Are you using the default mesos-ip-order? Is docker included in the list? I haven't seen a use for having docker in the search order since it returns the docker IP address which isn't particularly useful as far as I can tell. If it is in the search list, try removing it.

The port is probably correct. Mesos will assign a random port to the docker container and map from 31657->8080.

Yes the port is correct, it's just the IP chosen and registered with consul was the docker IP address. I am running the mesos-consul with defaults. The following is the marathon json i am using to run the mesos-consul bridge.

{
"container": {
"type": "DOCKER",
"docker": {
"image": "ciscocloud/mesos-consul",
"network": "BRIDGE",
"parameters": [
{ "key": "rm", "value": "true" }
]
}
},
"id": "mesos-consul",
"args": ["--zk=zk://192.168.33.10:2181/mesos"],
"instances": 1,
"cpus": 0.1,
"mem": 256,
"constraints": [["hostname", "CLUSTER", "node1"]]
}

Hmm...I can't reproduce...Can you post the task section from the Mesos master? /master/state.json from the Mesos leader

Here you go below, I think I may be bumping into this issue d2iq-archive/mesos-dns#334 (I know it says mesos-dns, but if you follow the thread, I believe someone is pointing to mesos as the potential issue, but I will have to dig some more) :

            {
                "executor_id": "",
                "framework_id": "13742ebd-7985-4898-b01e-6587d19b885d-0001",
                "id": "wildfly.88156cb6-925c-11e5-b212-02429beb943f",
                "name": "wildfly",
                "resources": {
                    "cpus": 0.3,
                    "disk": 0,
                    "mem": 256.0,
                    "ports": "[31268-31268]"
                },
                "slave_id": "a8f46f83-034d-459b-ac0e-e2effd094e4f-S1",
                "state": "TASK_RUNNING",
                "statuses": [
                    {
                        "container_status": {
                            "network_infos": [
                                {
                                    "ip_address": "172.17.0.3"
                                }
                            ]
                        },
                        "labels": [
                            {
                                "key": "Docker.NetworkSettings.IPAddress",
                                "value": "172.17.0.3"
                            }
                        ],
                        "state": "TASK_RUNNING",
                        "timestamp": 1448336183.15899
                    }
                ]
            },

That is exactly what you're running into. Ugh. The default search order is netinfo,mesos,host so it's using the ip address in the network_infos block. A workaround is to add "--mesos-ip-order=mesos,host" to your marathon job for mesos-consul.

@ChrisAubuchon I have actually been trying this, but now it seems mesos-consul bridge won't even register any new services with consul, I created a new service in marathon and when I go to the consul ui it doesn't register anything now?

This is now my new marathon json for mesos-consul

{
"container": {
"type": "DOCKER",
"docker": {
"image": "ciscocloud/mesos-consul",
"network": "BRIDGE",
"parameters": [
{ "key": "rm", "value": "true" }
]
}
},
"id": "mesos-consul",
"args": ["--zk=zk://192.168.33.10:2181/mesos --mesos-ip-order=mesos,host"],
"instances": 1,
"cpus": 0.1,
"mem": 256,
"constraints": [["hostname", "CLUSTER", "node1"]]
}

These are the logs that I see for the mesos-consul bridge:

vagrant@node1:~/projects/consul$ sudo docker logs 5cd4ec4464d7
2015/11/24 16:26:45 Connected to 192.168.33.10:2181
2015/11/24 16:26:45 Authenticated: id=94921046598942733, timeout=40000

Any clue as to why adding this new flag would cause issues?

The command line arguments in the args list need to be separated:

"args": [
  "--zk=zk://192.168.33.10:2181/mesos",
  "--mesos-ip-order=mesos,host"
  ],

oops! it's now working. thanks Chris.

Out of curiosity, do you work for cisco? what does cisco the company have to do with these projects?

Mesos-consul was developed as part of Cisco's Mantl project

thanks