If you are using Alert Manager to notify you of events and you have a rule for high RAM load, you can use alert manager to trigger an action to restart the node (or clear the cache) to free up memory whenever it exceeds a certain threshold.
I have this rule for high ram load:
- alert: HostHighRamLoad
expr: (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - (node_memory_Cached_bytes + node_memory_Buffers_bytes)) / node_memory_MemTotal_bytes *100 > 80
for: 0m
labels:
severity: warning
annotations:
summary: Host high RAM usage
description: "RAM usage is > 80%\n VALUE = {{ $value }}"
which triggers whenever the RAM usage exceeds 80%. I'm going to use this rule to trigger a restart of the node.
- Install webhook with:
sudo apt-get install webhook
. There are other methods to install it mentioned in the repo, but I'll assume you're using the Ubuntu package manager. - Create a script that holds the command to restart the node:
nano restart-node.sh
The content of the script should be:
#!/bin/bash
<COMMAND TO RESTART THE NODE OR THE CONTAINER OR TO CLEAR THE CACHE>
- Make it executable:
chmod +x restart-node.sh
- Create the webhook YAML file (we'll call it
restart-hook.yml
and it's located in/home/ubuntu
):
- id: restart-node
execute-command: "/home/ubuntu/restart-node.sh"
command-working-directory: "/home/ubuntu"
- Create a service that will be running the webhook to listen for events:
sudo nano /etc/systemd/system/restart-webhook.service
It should look like this:
[Unit]
Description=Restart Node Webhook
[Service]
User=ubuntu
Group=ubuntu
ExecStart=webhook -verbose -debug -ip localhost -hooks /home/ubuntu/restart-hook.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
This will run the webhook and listen for our events on http://localhost:9000/hooks/restart-node
. If you want to change the port you can add the flag -port <PORT>
.
6. Start the service:
sudo systemctl daemon-reload
sudo systemctl enable restart-webhook
sudo systemctl start restart-webhook
sudo systemctl status restart-webhook
- Now it's time to add the proper triggers in Alert Manager. Open the manager's yaml file and add the following triggers (
...
are other rules you might have):
route:
...
routes:
- matchers:
- alertname="HostHighRamLoad"
receiver: "webhook"
receivers:
...
- name: 'webhook'
webhook_configs:
- url: 'http://localhost:9000/hooks/restart-node'
- Save the file and restart alert manager. Now whenever RAM usage exceeds 80% the script will run and trigger the specified action!