Maintains REST API endpoints for various monitoring tasks on the parallel computer
sudo apt install python3-pip
sudo apt install libffi-dev
git clone https://github.com/Ormly/ParallelNano_Lisa_Lighthouse.git
cd ParallelNano_Lisa_Lighthouse
python3 setup.py install --user
# start wsgi server with 2 workers as daemon
$ gunicorn -w 2 wsgi:app --daemon
To kill daemon(s):
$ ps -ef | grep gunicorn
mario 34034 1254 0 17:39 ? 00:00:00 python gunicorn -w 2 wsgi:app --daemon
mario 34036 34034 73 17:39 ? 00:00:00 python gunicorn -w 2 wsgi:app --daemon
mario 34039 34034 69 17:39 ? 00:00:00 python gunicorn -w 2 wsgi:app --daemon
$ kill 34034 34036 34039
Agent is configured using the config.json
file residing in the same library.
{
"ipc_rest_adapters":
[
{
"adapter_name": "computer_nodes",
"ipc_queue": "/compute_node_beacon",
"rest_route": "/compute_node_beacon",
"group_by_attrib": "ip_address"
}
]
}
ipc_rest_adapters
- a list of adapters, matching ipc_queue to a REST endpointadapter_name
- name describing adapter (only used for logging)ipc_queue
- id of the POSIX queue to get messages fromrest_route
- name of REST endpointgroup_by_attrib
- Optional, messages may be grouped according to this attribute inside the incoming message
Daemon should be restarted to apply changes to config file
Lighthouse can be extended to support additional monitoring sources by following the following workflow
- Implement a program that places monitoring messages onto an IPC queue similarly to the Beacon Server
- Add an adapter to the Lighthouse config file, with the appropriate ipc queue name, and the desired REST endpoint URL.
- Restart Lighthouse
By adding a new REST Action, Lighthouse can map a REST endpoint to a python script, sending over any arguments passed to the API.
The basic structure of such Python script is as follows
import sys
def power_off_node(number):
"""
The actual functionality of the script
"""
actually_power_off_node(number)
def main(node_number) -> dict:
"""
Entry point of the script (from external location)
"""
power_off_node(node_number)
response = {
"action": "reset",
"target": node_number
}
if successful:
response["result"] = "success"
else:
response["result"] = "failed"
return response
if __name__ == "__main__":
"""
When running the script manually (takes arguments from stdin)
"""
main(sys.argv[1])
Get nodes information
URL: /compute_node_beacon
Response
{
"127.0.1.1":{"cpu":"x86_64","cpu_usage":3.7,"hostname":"node01","ip_address":"127.0.1.1","mem_usage":8.5540755014172,"platform":"Linux-5.4.0-48-generic-x86_64-with-glibc2.29","system":"Linux"},
"127.0.1.2":{"cpu":"x86_64","cpu_usage":3.7,"hostname":"node02","ip_address":"127.0.1.2","mem_usage":8.5540755014172,"platform":"Linux-5.4.0-48-generic-x86_64-with-glibc2.29","system":"Linux"},
"127.0.1.3":{"cpu":"x86_64","cpu_usage":3.7,"hostname":"node03","ip_address":"127.0.1.3","mem_usage":8.5540755014172,"platform":"Linux-5.4.0-48-generic-x86_64-with-glibc2.29","system":"Linux"}
}
Get temperature and humidity
URL: /temp_humidity
Response
{"temperature": 36.6, "humidity": 80.33}