Proxmox-load-balancer Pro v0.6.4

Please take a look: cvk98#7

If you use this script and it works correctly - please do not be lazy to put a star. This motivates me very much to develop my product. If you lack some functions, write about it. I will try to add them if they fit into the product concept.

Development progress:

~~Write a draft script~~
~~Put "break" and "continue" in their places~~
~~Arrange the functions in their places~~
Catch bugs
~~Correct variable names~~
~~Add comments~~
~~Add logging and sending notifications to the mail~~
~~Urgently translate into English~~
Add a VM selection algorithm for special cases when there is a need for migration, but there is no option that improves the balance
~~Test on th~~ree clusters

This script is designed to automatically load balance the RAM of the cluster (or part of the cluster) Proxmox. It does not use the CPU load balancing mechanism. I consider this unnecessary for many reasons, which it is inappropriate to list here. Unlike https://github.com/cvk98/Proxmox-memory-balancer the algorithm of this script has been significantly changed.

In particular:

Added a list of exclusions for the VMs and nodes.
It is now possible to disable LXC migration.
You can set the spread range of node loading, at which there is no balancing.
The VM selection algorithm for migration has been significantly redesigned (other criteria for evaluating the proposed migration options).
This script works constantly and does not finish its work when the balance is reached. Just falls asleep for 5 minutes (can be changed).
Now the script can be deployed automatically (via ansible) to all nodes of the cluster using HA. To do this, set only_on_master: ON in the config. Then it will run only on the master node.

Most likely, the script does not need a root PVE account. You can create a separate account with the necessary rights (not tested). But for those who are worried that the script may harm your cluster, I can say that there is only one POST method used for VM/LXC migration.

Does not take into account the recommendations of HA!

Recommendations:

For the migration mechanism to work correctly, a shared storage is required. This can be a CEPH (or other distributed storage) or a storage system connected to all Proxmox nodes.
For a cluster similar in size and composition to the one in the screenshot, the normal value of "deviation" is 4%. This means that with an average load of the cluster (or part of it) the maximum deviation of the RAM load of each node can be 2% in a larger or smaller direction. Example: cluster load is 50%, the minimum loaded node is 48%, the maximum loaded node is 52%. Moreover, it does not matter at all how much RAM the node has.
Do not set the "deviation" value to 0. This will result in a permanent VM migration at the slightest change to the VM["mem"]. The recommended minimum value is 1% for large clusters with many different VMs. For medium and small clusters 3-5%
For the script to work correctly, you need constant access to the Proxmox host. Therefore, I recommend running the script on one of the Proxmox nodes or creating a VM/Lxc in a balanced cluster and configuring the script autorun.

To autorun the script on Linux (ubuntu):
touch /etc/systemd/system/load-balancer.service
chmod 664 /etc/systemd/system/load-balancer.service
Add the following lines to it, replacing USERNAME with the name of your Linux user:

 [Unit]  
 Description=Proxmor cluster load-balancer Service  
 After=network.target  

 [Service]  
 Type=simple  
 User=USERNAME  
 NoNewPrivileges=yes  
 ExecStart=/home/USERNAME/plb.py  
 WorkingDirectory=/home/USERNAME/  
 Restart=always  
 RestartSec=300  

 [Install]  
 WantedBy=multi-user.target

systemctl daemon-reload
systemctl start load-balancer.service
systemctl status load-balancer.service
systemctl enable load-balancer.service

Tested on Proxmox 7.1-10 virtual environment with more than 400 virtual machines
Before using the script, please read the Supplement to the license

Changelog:

0.6.4 (21.03.23)

fix of an error that occurs when nodes are turned off (thanks to dmitry-ko) cvk98#14

0.6.3 (07.11.22)

fix bug with lxc migration (thanks to MarcMocker) cvk98#11

0.6.2 (22.08.22)

Add range generation for vm exclusion (thanks to Azerothian) cvk98#9

0.6.1 (22.06.22)

Added the "resume" operation 10 seconds after VM migration. Since sometimes the following situation occurs:

0.6.0 (23.05.22)

Added a mechanism for checking the launch of the load balancer on the HA cluster master node (thanks to Cylindrical) cvk98#3

0.5.2 (20.05.22)

Minor improvements suggested by Cylindric regarding cluster health check

0.5.1 (18.05.22)

If the cluster has been balanced for the last 10 attempts, the "operational_deviation" parameter is reduced by 2 or 4 or 8 times with some probability.

0.5.0 (04.05.22)

Added email notification about critical events

0.4.2 (29.04.22)

Removed bestconfig due to encoding issues
Added a check when opening the config

0.4.0 (28.04.22)

All settings are placed in the configuration file (config.yaml)

0.3.0 (22.04.2022)

Added logging based on the loguru library (don't forget pip3 install loguru). Now logs can be viewed in the console or /var/log/syslog
sys.exit() modes have been changed for the script to work correctly in daemon mode

0.2.0 (20.04.2022)

All comments and messages are translated into English
UTF-8 encoding throughout the document

Running the script is tested on:

PyCharm 2021+, Python 3.10+, Win10
Proxmox LXC Ubuntu 20.04 (1 core, 256 MB, 5GB HDD), Python 3.8+ ~~(0.4.0)~~

If you have any exceptions, please write about them in https://github.com/cvk98/Proxmox-load-balancer/issues. I'll try to help you.

pin-chen/Proxmox-load-balancer