Error: broker start timeout
Closed this issue · 9 comments
CoreOS 845, Docker 1.8.3, Mesos 0.25 and deploying in Marathon .11
So, i create the broker and can see that it was created via API:
$ curl http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/list
{"brokers" : [{"stickiness" : {"period" : "10m"}, "id" : "0", "port" : "31092", "mem" : 2048, "cpus" : 1.0, "heap" : 1024, "failover" : {"delay" : "1m", "maxDelay" : "10m"}, "active" : false}]}
and when i do a:
$ ./kafka-mesos.sh broker start 0
Error: broker start timeout
and I can see it "trying to start"..
$ ./kafka-mesos.sh broker list
broker:
id: 0
active: true
state: starting
resources: cpus:1.00, mem:2048, heap:1024, port:31092
failover: delay:1m, max-delay:10m
stickiness: period:10m
I must be missing something fundamental: the marathon JSON is:
{
"id": "kafka-mesos-scheduler",
"cpus": 1.0,
"mem": 1024.0,
"instances": 1,
"ports": [31700],
"cmd": "./kafka-mesos.sh scheduler --master=zk://zk01:2181,zk02:2181,zk03:2181,zk04:2181,zk05:2181/mesos --zk=zk01:2181,zk02:2181,zk03:2181,zk04:2181,zk05:2181 --api=http://kafka-mesos-scheduler.marathon.mesos:31700 --storage=zk:/kafka-mesos",
"container": {
"type": "DOCKER",
"docker": {
"image": "private-hub/kafka-mesos:0.9.3.0",
"network": "BRIDGE",
"portMappings": [
{ "containerPort": 31700, "hostPort": 31700, "servicePort": 31700, "protocol": "tcp" }
],
"parameters": [
{ "key": "name", "value": "kafka-mesos-scheduler" }
]
}
},
"maxLaunchDelaySeconds": 7200
}
@aglahe
Please look into stdout
and stderr
files in the sandbox from the mesos UI. It should probably contain the error, happening during broker start. If this would not help to identify the issue, please attach those files to the ticket.
So, i started the scheduler with --debut=true
stderr:
I1117 16:12:18.648713 2681 logging.cpp:172] INFO level logging started!
I1117 16:12:18.648988 2681 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/tmp\/slaves\/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23\/frameworks\/20150925-222237-83912970-5050-1-0001\/executors\/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733\/runs\/fba1d308-1c6f-479e-ba17-94133afb71fe"}
I1117 16:12:18.651567 2681 fetcher.cpp:369] Fetching URI 'file:///etc/docker.tar.gz'
I1117 16:12:18.651590 2681 fetcher.cpp:243] Fetching directly into the sandbox directory
I1117 16:12:18.651618 2681 fetcher.cpp:180] Fetching URI 'file:///etc/docker.tar.gz'
I1117 16:12:18.651638 2681 fetcher.cpp:160] Copying resource with command:cp '/etc/docker.tar.gz' '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe/docker.tar.gz'
I1117 16:12:18.655037 2681 fetcher.cpp:76] Extracting with command: tar -C '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe' -xf '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe/docker.tar.gz'
I1117 16:12:18.661016 2681 fetcher.cpp:84] Extracted '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe/docker.tar.gz' into '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe'
I1117 16:12:18.661087 2681 fetcher.cpp:446] Fetched 'file:///etc/docker.tar.gz' to '/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe/docker.tar.gz'
I1117 16:12:18.876677 2693 exec.cpp:134] Version: 0.25.0
I1117 16:12:18.881270 2728 exec.cpp:208] Executor registered on slave 23febc46-85d6-471b-bdce-a46da8ccdbbc-S23
tput: No value for $TERM and no -T specified
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@716: Client environment:host.name=6be26e12cad9
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@724: Client environment:os.arch=4.2.2-coreos
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Wed Oct 14 22:58:02 UTC 2015
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@log_env@753: Client environment:user.dir=/opt/kafka-mesos
2015-11-17 16:12:22,768:12(0x7fa19affd700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=10.105.0.1:2181,10.105.0.3:2181,10.105.0.5:2181,10.105.0.7:2181,10.105.0.9:2181 sessionTimeout=10000 watcher=0x7fa1cb3ac600 sessionId=0 sessionPasswd=<null> context=0x7f9fa4000ad0 flags=0
2015-11-17 16:12:22,769:12(0x7fa191ffb700):ZOO_INFO@check_events@1703: initiated connection to server [10.105.0.5:2181]
I1117 16:12:22.771742 13 sched.cpp:164] Version: 0.25.0
2015-11-17 16:12:22,819:12(0x7fa191ffb700):ZOO_INFO@check_events@1750: session establishment complete on server [10.105.0.5:2181], sessionId=0x35071d22f9d0023, negotiated timeout=10000
I1117 16:12:22.819651 132 group.cpp:331] Group process (group(1)@172.17.0.40:42763) connected to ZooKeeper
I1117 16:12:22.819699 132 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1117 16:12:22.819712 132 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I1117 16:12:22.821861 152 detector.cpp:156] Detected a new leader: (id='72')
I1117 16:12:22.822012 135 group.cpp:674] Trying to get '/mesos/json.info_0000000072' in ZooKeeper
I1117 16:12:22.823520 153 detector.cpp:481] A new leading master (UPID=master@10.105.0.5:5050) is detected
I1117 16:12:22.823668 131 sched.cpp:262] New master detected at master@10.105.0.5:5050
I1117 16:12:22.823957 131 sched.cpp:272] No credentials provided. Attempting to register without authentication
stdout:
--container="mesos-23febc46-85d6-471b-bdce-a46da8ccdbbc-S23.fba1d308-1c6f-479e-ba17-94133afb71fe" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe" --stop_timeout="0ns"
--container="mesos-23febc46-85d6-471b-bdce-a46da8ccdbbc-S23.fba1d308-1c6f-479e-ba17-94133afb71fe" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/slaves/23febc46-85d6-471b-bdce-a46da8ccdbbc-S23/frameworks/20150925-222237-83912970-5050-1-0001/executors/kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733/runs/fba1d308-1c6f-479e-ba17-94133afb71fe" --stop_timeout="0ns"
Registered docker executor on r105u33.dsra.local
Starting task kafka-mesos-scheduler.f96eb5ae-8d45-11e5-bfe4-46f423552733
2015-11-17 16:12:21,315 [main] INFO ly.stealth.mesos.kafka.Scheduler$ - Starting Scheduler$:
debug: true, storage: zk:/kafka-mesos
mesos: master=zk://10.105.0.1:2181,10.105.0.3:2181,10.105.0.5:2181,10.105.0.7:2181,10.105.0.9:2181/mesos, user=<default>, principal=<none>, secret=<none>
framework: name=kafka, role=*, timeout=30d
api: http://kafka-mesos-scheduler.marathon.mesos:31700, bind-address: <all>, zk: 10.105.0.1:2181,10.105.0.3:2181,10.105.0.5:2181,10.105.0.7:2181,10.105.0.9:2181, jre: <none>
2015-11-17 16:12:22,108 [main] INFO org.eclipse.jetty.server.Server - jetty-9.0.z-SNAPSHOTWrappedArray()
2015-11-17 16:12:22,191 [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started WrappedArray(o.e.j.s.ServletContextHandler@6c6fa0b5{/,null,AVAILABLE})
2015-11-17 16:12:22,318 [main] INFO org.eclipse.jetty.server.ServerConnector - Started WrappedArray(ServerConnector@c5f5deb{HTTP/1.1}{0.0.0.0:31700})
2015-11-17 16:12:22,319 [main] INFO ly.stealth.mesos.kafka.HttpServer$ - started on port 31700
2015-11-17 16:12:53,981 [Jetty-76] INFO ly.stealth.mesos.kafka.HttpServer$ - handling - http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/stop
2015-11-17 16:12:54,027 [Jetty-76] INFO ly.stealth.mesos.kafka.HttpServer$ - finished handling
2015-11-17 16:12:56,924 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - handling - http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/stop
2015-11-17 16:12:56,929 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - finished handling
2015-11-17 16:12:58,300 [Jetty-76] INFO ly.stealth.mesos.kafka.HttpServer$ - handling - http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/stop
2015-11-17 16:12:58,305 [Jetty-76] INFO ly.stealth.mesos.kafka.HttpServer$ - finished handling
2015-11-17 16:13:06,200 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - handling - http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/list
2015-11-17 16:13:06,210 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - finished handling
2015-11-17 16:13:11,899 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - handling - http://kafka-mesos-scheduler.marathon.mesos:31700/api/broker/start
2015-11-17 16:14:12,372 [Jetty-79] INFO ly.stealth.mesos.kafka.HttpServer$ - finished handling
I know there were a few "stops," but was me just making sure th "old" broker was stopped.
@aglahe
These are the out and err of the Scheduler. Could you please provide stdout and stderr of the failed broker?
Ok. The problem seems to be in the fact, that the scheduler doesn't receive offers. This in turn may be influenced by connectivity problem in direction from mesos master -> scheduler.
You can dig in master logs (accessible from web UI logs link) for more details. Also you may try to play with LIBPROCESS_IP settings, pointing to the accessible IP address of the Scheduler.
This problem isn't actually related the framework itself. It is related to some deployment problem.
Sounds good...thanks for the help..i'll keep digging, and update this when I figure it out.
Figured it out...i guess you cannot run the scheduler, inside a docker container, in bridge mode..it needs to be HOST mode. I now see brokers' starting up.
$ ./kafka-mesos.sh broker start 0
broker started:
id: 0
active: true
state: running
resources: cpus:1.00, mem:2048, heap:1024, port:auto
failover: delay:1m, max-delay:10m
stickiness: period:10m, hostname:r105u13.dsra.local
task:
id: broker-0-4ffad113-da45-499e-b374-a72100e99478
state: running
endpoint: r105u13.dsra.local:31000
@aglahe
Great! Closing the ticket then.