mesos-consul not registering all instances into Consul
Lax77 opened this issue · 10 comments
Here is what I am doing. I am using mesos IP per task (thru Calico) where in each container is given a separate IP. I had a case where multiple instances of a task deployed on the same node, with each instance having assigned an IP of its own(10.100.0.179, 10.100.0.180) . So far good, but when it comes to registering those instances into Consul, mesos-consul happens to register only one of the instance (10.100.0.179) and skipping the other one.
Is there a known limitation with mesos-consul?
Also whenever I register IP per container service thru mesos-consul I see 2 entries with same IP within Consul. One with the Port service is listening on and one with port 0. Why
eg: If my container has an IP 10.100.0.155 and listening on port 8000. Consul sees 2 entries one with
10.100.0.155:8000
10.100.0.155:0
Why does mesos-consul registers with port 0? I tried with mesos-ip-order as 'docker,mesos,host' and also as just 'docker'. Both the times I get an entry with port 0.
Can you post the Mesos master state? From http://<mesos-master>:5050/master/state.json
. I'm specifically looking for those tasks.
Here is sample state seen for one such container:
{
"id": "hulk_sensu-server.2064fb31-3bef-11e6-bc00-424cfb1b27f9",
"name": "sensu-server.hulk",
"framework_id": "4b6fa780-f06f-4917-a10f-caaa57e7fbc6-0000",
"executor_id": "",
"slave_id": "aa0449f8-0c7c-4967-bcac-066d8c62d05b-S2",
"state": "TASK_RUNNING",
"resources": {
"cpus": 0.1,
"disk": 0,
"mem": 256
},
"statuses": [
{
"state": "TASK_RUNNING",
"timestamp": 1466980893.35396,
"labels": [
{
"key": "Docker.NetworkSettings.IPAddress",
"value": "10.100.0.179"
}
],
"container_status": {
"network_infos": [
{
"ip_address": "10.100.0.179",
"ip_addresses": [
{
"ip_address": "10.100.0.179"
}
]
}
]
}
}
],
"labels": [
{
"key": "task_type",
"value": "infra"
}
],
"discovery": {
"visibility": "FRAMEWORK",
"name": "sensu-server.hulk",
"ports": {
"ports": [
{
"number": 4567,
"name": "sensuserverport",
"protocol": "tcp"
}
]
}
},
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.internal.com:5915/lkota/sensu-server",
"network": "HOST",
"privileged": false,
"parameters": [
{
"key": "net",
"value": "services"
}
],
"force_pull_image": true
},
"network_infos": [
{
"ip_addresses": [
{}
],
"labels": {}
}
]
}
},
I am guessing primary reason for the meson-consul registering with port 0 is because of this API: https://github.com/CiscoCloud/mesos-consul/blob/c442e3dd053399dcc28742f82e4b20b7007d4d15/mesos/register.go#L167. I could be wrong too.
That's what I'm thinking as well Re: port 0
For the other case you mentioned, are the port numbers the same in the discovery info? If so, that would explain why only one instance is being registered since mesos-consul uses the port number to create the consul service ID.
Yeah, port numbers are same (4567). Here is the mess state response with both containers info.
{
"id": "hulk_sensu-server.2064fb31-3bef-11e6-bc00-424cfb1b27f9",
"name": "sensu-server.hulk",
"framework_id": "4b6fa780-f06f-4917-a10f-caaa57e7fbc6-0000",
"executor_id": "",
"slave_id": "aa0449f8-0c7c-4967-bcac-066d8c62d05b-S2",
"state": "TASK_RUNNING",
"resources": {
"cpus": 0.1,
"disk": 0,
"mem": 256
},
"statuses": [
{
"state": "TASK_RUNNING",
"timestamp": 1466980893.35396,
"labels": [
{
"key": "Docker.NetworkSettings.IPAddress",
"value": "10.100.0.179"
}
],
"container_status": {
"network_infos": [
{
"ip_address": "10.100.0.179",
"ip_addresses": [
{
"ip_address": "10.100.0.179"
}
]
}
]
}
}
],
"labels": [
{
"key": "task_type",
"value": "infra"
}
],
"discovery": {
"visibility": "FRAMEWORK",
"name": "sensu-server.hulk",
"ports": {
"ports": [
{
"number": 4567,
"name": "sensuserverport",
"protocol": "tcp"
}
]
}
},
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.internal.com:5915/lkota/iot-sensu-server",
"network": "HOST",
"privileged": false,
"parameters": [
{
"key": "net",
"value": "services"
}
],
"force_pull_image": true
},
"network_infos": [
{
"ip_addresses": [
{}
],
"labels": {}
}
]
}
},
{
"id": "hulk_sensu-server.20652242-3bef-11e6-bc00-424cfb1b27f9",
"name": "sensu-server.hulk",
"framework_id": "4b6fa780-f06f-4917-a10f-caaa57e7fbc6-0000",
"executor_id": "",
"slave_id": "aa0449f8-0c7c-4967-bcac-066d8c62d05b-S2",
"state": "TASK_RUNNING",
"resources": {
"cpus": 0.1,
"disk": 0,
"mem": 256
},
"statuses": [
{
"state": "TASK_RUNNING",
"timestamp": 1466980893.3635,
"labels": [
{
"key": "Docker.NetworkSettings.IPAddress",
"value": "10.100.0.180"
}
],
"container_status": {
"network_infos": [
{
"ip_address": "10.100.0.180",
"ip_addresses": [
{
"ip_address": "10.100.0.180"
}
]
}
]
}
}
],
"labels": [
{
"key": "task_type",
"value": "infra"
}
],
"discovery": {
"visibility": "FRAMEWORK",
"name": "sensu-server.hulk",
"ports": {
"ports": [
{
"number": 4567,
"name": "sensuserverport",
"protocol": "tcp"
}
]
}
},
"container": {
"type": "DOCKER",
"docker": {
"image": "docker.internal.com:5915/lkota/iot-sensu-server",
"network": "HOST",
"privileged": false,
"parameters": [
{
"key": "net",
"value": "services"
}
],
"force_pull_image": true
},
"network_infos": [
{
"ip_addresses": [
{}
],
"labels": {}
}
]
}
},
Yep. That's it. Mesos-consul uses the consul agent address and port number in the service ID. Which was fine when ip-per-container wasn't there. PR #88 should take care of that.
Ok, Thanks Chris. If I pull the latest docker image it should have the fix? or do I have to build one with the latest code pull?
Re-tried with the latest image. Could see multiple instances on same box issue got fixed. Now I could see all the container's IPs being of multi instances on the same box being registered into Consul
As per Port 0, still see IP per task containers listing one with actual port (in my case 4567) and one with port 0
PR #89 should take care of the port 0 service.
Thanks for the quick fix Chris. Just verified the latest image, Port 0 issue is also gone now.