Atom table limit hit if `riak admin` called regularly

Question

Atom table limit hit if `riak admin` called regularly

Bob-The-Marauder opened this issue 3 years ago · 3 comments

One of our customers found an issue with KV 3.0.3 where the atom table kept becoming exhausted if riak admin is called regularly e.g. polling riak admin status for monitoring purposes. This was traced to a problem with relx in pre-OTP23 builds. We have filed the following PR erlware/relx#868

Here is a brief example where the atom count increases.

[root@localhost riak]# riak start
[root@localhost riak]# riak attach
Attaching to /tmp/erl_pipes/riak@127.0.0.1/erlang.pipe.1 (^D to exit)

(riak@127.0.0.1)1> erlang:system_info(atom_count).
52654
(riak@127.0.0.1)2> [Quit]
[root@localhost riak]# riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+------+-------+-----+-------+
|        node        |status| avail |ring |pending|
+--------------------+------+-------+-----+-------+
| (C) riak@127.0.0.1 |valid |  up   |100.0|  --   |
+--------------------+------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
[root@localhost riak]# riak attach
Attaching to /tmp/erl_pipes/riak@127.0.0.1/erlang.pipe.1 (^D to exit)

(riak@127.0.0.1)2> erlang:system_info(atom_count).
52656

Although such a small increment should not really cause any issues, when riak admin status is polled regularly 24 hours/day, it slowly adds up until you finally hit the 1 million atom mark and Riak crashes. Current work around is to restart Riak before the atom count gets too high.

Answer 1 · 2021-05-12T13:58:02.000Z

Ahh interesting. I'm sure I've heard this same problem talked about before.

Sounds like it might be something along the lines of using list_to_atom/1 when creating a random maint shell name, which I think would occur everytime riak admin is called.

Answer 2 · 2021-05-12T14:02:41.000Z

Ignore me, didn't read properly - see that it's already been dug into and the guilty code found and fixed.

Answer 3 · 2021-05-19T13:12:47.000Z

We made these changes locally and, although there does seem to be some improvement, it does not fix the issue. We're currently trying to find the source of the issue.