Different "consistent hashing" results from Carbon and Carbonate
sw0x2A opened this issue · 12 comments
Carbon-sieve reports that some metrics belong to another node but these metrics are actually used and updated by carbon.
DESTINATIONS are defined identical in carbon.conf and carbonate.conf. All daemons are restarted and using this configuration.
Roughly 1/8 of all metrics are reported to belong to another host, the rest is fine.
Cluster has 6 nodes. 1 has haproxy which is load-balancing requests to 8 carbon-relays on the same machine. These carbon-relays use consistent hashing and DESTINATIONS = 172.22.5.14:2014:relay, 172.22.5.106:2014:relay, 172.22.5.107:2014:relay, 172.22.5.234:2014:relay, 172.22.5.235:2014:relay
. Each of the entries in DESTINATIONS is a carbon-relay on the other 5 servers and has DESTINATIONS = 127.0.0.1:2004:cache, 127.0.0.1:2104:b
, two carbon-cache instances using the same whisper storage path.
When I was using carbin-sieve to clean-up some stuff, I noticed that its results are wrong.
Example: rg.community.pagespeed.publicationDetail.loggedOut.connect.median
File exists on host with IP 172.22.5.14:
$ stat /data/graphite/whisper/rg/community/pagespeed/publicationDetail/loggedOut/connect/median.wsp
File: ‘/data/graphite/whisper/rg/community/pagespeed/publicationDetail/loggedOut/connect/median.wsp’
Size: 325816 Blocks: 640 IO Block: 4096 regular file
[...]
Access: 2016-03-06 08:50:09.246530787 +0000
Modify: 2016-05-01 12:58:52.698880921 +0000
Change: 2016-05-01 12:58:52.698880921 +0000
Birth: -
But carbon-sieve wants it on 172.22.5.106:
$ echo "rg.community.pagespeed.publicationDetail.loggedOut.connect.median" | carbon-sieve -C main -n 172.22.5.106
rg.community.pagespeed.publicationDetail.loggedOut.connect.median
I am troubleshooting this since hours but cannot find a reason. I also noticed that carbon-sieve is using the hashing method from carbon library. Approximately 493000 of 4.11 million metrics are wrong, between 9.5% and 13.5% of the metrics per node. This is close to 12,5% (1/8) like the 8 carbon-relays in haproxy.
Any hints are highly appreciated. If you need more information, please do not hesitate to ask.
Forgot to mention environment. All servers are:
Ubuntu 14.04.2 LTS (trusty)
Python 2.7.6
carbon==0.9.15
carbonate==0.2.2
Hi @sw0x2A,
did you change something on default carbon.conf? what's DIVERSE_REPLICAS equal for?
or better publish it somewhere.
Hi @deniszh,
DIVERSE_REPLICAS is not set in my carbon.conf which defaults to False, I assume. Please find full carbon.conf below.
carbon.conf used on relay host
carbon.conf used on cache hosts
Very strange then. As you can see in the code - https://github.com/graphite-project/carbonate/blob/master/carbonate/cluster.py#L9 - carbonate
did not contain hashing code, it uses Graphite code from /opt/graphite/lib/carbon/routers.py
Maybe you have different version of carbon installed there?
Checked that already. Same version of Python and carbon and carbonate on all servers.
BTW carbonate.conf for completeness:
[main]
DESTINATIONS = 172.22.5.14:2014:relay, 172.22.5.106:2014:relay, 172.22.5.107:2014:relay, 172.22.5.234:2014:relay, 172.22.5.235:2014:relay
REPLICATION_FACTOR = 1
SSH_USER = root
[old]
DESTINATIONS = 172.22.5.14:2014:relay, 172.22.5.106:2014:relay, 172.22.5.107:2014:relay
REPLICATION_FACTOR = 1
SSH_USER = root
Maybe worth mentioning, this not only happens with carbon-sieve. Actually, consistent hashing results of carbin-sieve and carbon-lookup are the same but in around 12% of the metrics different from where the carbon-relays send the data.
Sorry, @sw0x2A, has no more ideas. Need to test it by myself, but have no time now, unfortunately.
BTW, if you have RF=1 you can try bucky tools - https://github.com/jjneely/buckytools - and check do you have this problem there...
Hi @deniszh , thanks for the link to buckytools. They look quite useful and at least the results of the metrics I tested are the same on carbon-lookup and bucky.
Whisper file is updated on 172.22.5.14 but carbon-lookup and bucky want it on 172.22.5.106. Guess this means something on carbon-relays is wrong...
$ ./bucky locate rg.community.pagespeed.publicationDetail.loggedOut.connect.median
rg.community.pagespeed.publicationDetail.loggedOut.connect.median => 172.22.5.106
$ carbon-lookup rg.community.pagespeed.publicationDetail.loggedOut.connect.median
172.22.5.106:2014:relay
Yep, quite strange. I'm using https://github.com/grobian/carbon-c-relay as relay now. Could you please maybe check that too?
This is nothing that I can change now but I will keep this in mind. BTW I added a value for that metric using the carbon-client.py. It has been send to and created a new Whisper file on 172.22.5.106.
Just ran into the same issue and after a slew of debugging it turned out that we were sending metrics that looked something like prefix-part-1.prefix-part-2..metric
(notice the ..
), but for carbonate it's impossible to know that a metric contains double dots due to the filesystem being "nice" and just disregarding them when Graphite is writing them to disk.
@sw0x2A, guessing you've already moved on from this (or worked it out somehow), or else you might want to check the same.
@mthssdrbrg It is the same issue here too. Metrics contain ..
which is really hard to find when you only check how the metrics are distributed and written to the filesystem. Thanks a lot for your comment!