Queue capacity issue
coreyappleby opened this issue · 13 comments
I've got Polymur running in a Docker container on a Mesos/Marathon cluster but I'm running into an issue with it reaching the carbon-cache backends. It looks like the queue capacity is set to 0, even though I've tried increasing it with the option flags. Here's what I'm using to start Polymur:
/go/bin/polymur -listen-addr "0.0.0.0:2003" -stat-addr "0.0.0.0:2020" -api-addr "0.0.0.0:2030" -distribution "hash-route" -console-out -outgoing-queue-cap 8192 -incoming-queue-cap 65536
It starts up fine, accepts metrics, and the carbon-cache instance is able to register with it and seems to be working correclty. However Polymur's output shows the capacity of the queue is zero and drops all messages destined for the carbon cache even before I send any metrics.
2017/12/12 03:11:44 Adding destination to connection pool: hostname.domain.com:31501
2017/12/12 03:11:44 Destination hostname.domain.com:31501 queue is at capacity (0) - further messages will be dropped
Has anyone run into this before? Am I doing something incorrectly?
I haven't seen this issue but can help look into it. Are you building Polymur from the current master?
Also one thing to check into is if the POLYMUR_OUTGOING_QUEUE_CAP
env var is being overwritten somehow. Take note of the envy / env flag usage: https://github.com/jamiealquiza/polymur#usage
Yes, the dockerfile pulls down the latest from github to build the image.
I tried overriding the POLYMUR_OUTGOING_QUEUE_CAP directly (instead of using the runtime options) but it still insists it has a queue capacity of zero. I'm honestly stumped.
Also just to confirm I ran it locally (instead of on our mesos cluster) and get the same results. Very strange.
This is weird, also what version of Go are you building this with? One thing I can try doing is cutting you a branch that reports some diagnostic info that could help figure out what’s going on here.
I'm using 1.8.1 in the Docker image and running it outside Docker with 1.9.2. Both cases do the same thing.
I've created a branch that prints relevant diagnostic info prefixed with 'xxx' - if you could do a build from this and share the output (if any information is considered sensitive, only the 'xxx' entries are needed): https://github.com/jamiealquiza/polymur/tree/queue-cap-test
Thanks for that! Here's the output from "polymur -console-out"
XXX queue cap env var unset
2017/12/13 23:09:08 ::: Polymur :::
XXX queue cap config: 4096
2017/12/13 23:09:08 Runstats started: localhost:2020
2017/12/13 23:09:08 API started: localhost:2030
2017/12/13 23:09:08 Metrics listener started: 0.0.0.0:2003
2017/12/13 23:09:20 Registered destination hostname.domain.com:31890
2017/12/13 23:09:20 Adding destination to connection pool: hostname.domain.com:31890
XXX queue for hostname.domain.com:31890 set to 0
2017/12/13 23:09:23 Destination hostname.domain.com:31890 queue is at capacity (0) - further messages will be dropped
UPDATE: I got it to work!
Something is going on with the arguments and the order (or I am missing something). At first I thoughbt it was the "=" between the arg and the value but eventually I found it has something to do with the positional order of the args. Notice the 2 invocations below, when I put -console-out on the end, some of the args don't get picked up. So with some trial-n-error I got it to work.
GOOD
$ /usr/local/go/polymur/bin/polymur-gateway -key "/usr/local/go/polymur/key.pem" -cert "/usr/local/go/polymur/cert.pem" -destinations "x.x.x.x:2203"
2018/03/16 13:39:07 ::: Polymur-gateway :::
2018/03/16 13:39:07 Registered destination x.x.x.x:2203
2018/03/16 13:39:07 Adding destination to connection pool: x.x.x.x:2203
2018/03/16 13:39:08 Running API key sync
2018/03/16 13:39:08 HTTP listening on 0.0.0.0:80
2018/03/16 13:39:08 API started: localhost:2030
2018/03/16 13:39:08 Runstats started: localhost:2020
2018/03/16 13:39:08 HTTPS listening on 0.0.0.0:443
2018/03/16 13:39:08 API keys refreshed: 2 new, 0 removed
2018/03/16 13:39:13 [client xx.xx.xx.xx:36067] Recieved batch from from test-api
BAD
$ /usr/local/go/polymur/bin/polymur-gateway -key "/usr/local/go/polymur/key.pem" -cert "/usr/local/go/polymur/cert.pem" -destinations "x.x.x.x:2203" -console-out
2018/03/16 13:39:20 ::: Polymur-gateway :::
2018/03/16 13:39:20 Running API key sync
2018/03/16 13:39:20 HTTP listening on 0.0.0.0:80
2018/03/16 13:39:20 API started: localhost:2030
2018/03/16 13:39:20 Runstats started: localhost:2020
2018/03/16 13:39:20 HTTPS listening on 0.0.0.0:443
2018/03/16 13:39:20 API keys refreshed: 2 new, 0 removed
OP:
hi i have the same issue - any update on what the cause is?
I'm using polymur-gateway fwiw
$ /usr/local/go/polymur/bin/polymur-gateway -key=/usr/local/go/polymur/key.pem -cert=/usr/local/go/polymur/cert.pem -destinations=xxx.xxx.xxx.xxx:2203 -console-out
2018/03/16 12:39:07 ::: Polymur-gateway :::
2018/03/16 12:39:07 Running API key sync
2018/03/16 12:39:07 HTTP listening on 0.0.0.0:80
2018/03/16 12:39:07 API started: localhost:2030
2018/03/16 12:39:07 Runstats started: localhost:2020
2018/03/16 12:39:07 HTTPS listening on 0.0.0.0:443
2018/03/16 12:39:07 API keys refreshed: 2 new, 0 removed
2018/03/16 12:39:10 Registered destination xxx.xxx.xxx.xxx:2203
2018/03/16 12:39:10 Adding destination to connection pool: xxx.xxx.xxx.xxx:2203
2018/03/16 12:39:12 Destination xxx.xxx.xxx.xxx:2203 queue is at capacity (0) - further messages will be dropped
carbon-cache is running on destination host listening on 2203/tcp.
On the carbon-cache host I run tcpdump and I can see the initial hit to the port but the error above stops it from retrying. I also see the whisper files initially created but obviously no data in the them.
Also I noticed the -destinations=xxx.xxx.xxx.xxx:2203 arg doesn't do anything? even after registering the destination with the api, a subsequent process restart doesn't re-establish the destination? Is there a step to make the destinations persistent?
In fact, there's a few args that don't seem to get picked up, at least it is not reflected in -console-out output - neither of these seemed to have any affect on the process:
$ /usr/local/go/polymur/bin/polymur-gateway -outgoing-queue-cap=8192 -incoming-queue-cap=65535
$ POLYMUR_GW_OUTGOING_QUEUE_CAP=8192 POLYMUR_GW_INCOMING_QUEUE_CAP=65535 /usr/local/go/polymur/bin/polymur-gateway
for reference:
$ go version
go version go1.9.2 linux/amd64
thanks!!
Hey @jlytle-interactions, thanks for that extra info! The position of the args shouldn't matter, so there's definitely still something I need to look at. Glad you found a fix, apologies for my slow response today.
Hi,
I have the same issue.
The queue is full although it has 0 messages.
And also the destinations parameter doesn't work - I always have to set it with putdest.
2018/10/02 17:57:30 Destination 127.0.0.1:20004 queue is at capacity (0) - further messages will be dropped
I start it like this:
$GOPATH/bin/polymur -console-out -destinations="localhost:20003" -listen-addr=0.0.0.0:20033 -metrics-flush=10 -outgoing-queue-cap=50000
I'm running golang-1.7
Also are there other commands apart from "stats" for the runstats api?
The following metrics don't show up there:
polymur.incoming-queue.current-size 0 1538496248
polymur.incoming-queue.limit 50000 1538496248
Cheers,
Felix
I've been having this same issue and I think i managed to get it running using Docker along with docker-compose
docker-compose.yml
services:
polymur:
image: diceone/docker-polymur
command: ["/go/bin/polymur"]
environment:
- POLYMUR_API_ADDR=0.0.0.0:2030
- POLYMUR_DESTINATIONS=primary:2003,secondary:2003
- POLYMUR_DISTRIBUTION=broadcast
- POLYMUR_INCOMING_QUEUE_CAP=32768
- POLYMUR_OUTGOING_QUEUE_CAP=4096
- POLYMUR_METRICS_FLUSH=60
ports:
- 2222:2003
- 2030:2030
links:
- graphite_primary:primary
- graphite_secondary:secondary
grafana:
image: grafana/grafana
network_mode: host
graphite_primary:
image: sitespeedio/graphite:1.1.3
ports:
- 9999:80
- 2203:2003
graphite_secondary:
image: sitespeedio/graphite:1.1.3
ports:
- 8888:80
- 2303:2003
By using this docker-compose file I was able to get things running smoothly. No more Destination 127.0.0.1:2222 queue is at capacity (0) - further messages will be dropped
.
- login to Grafana
http://localhost:3000
(user:pass admin/admin) - Create the datasources
http://localhost:9999
andhttp://localhost:8888
. - Create dashboard making sure to select the newly added datasources
Hopefully this helps someone :)