Metrics reporters needs checks before they start
dan-mm opened this issue · 2 comments
- Please check current metric providers and how they implement already these starting checkes
-
Please create a concept / plan which metric reporters still need checks and which checks you plan to integrate and also how.
-
I want documentation for the providers what NEEDS to be installed to make them work inside of the page itself.
Example: Not like this: https://docs.green-coding.berlin/docs/installation/installation-linux/#xgboost
But it shall be here: https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-xgboost-system/
I went through all the metric providers and how they worked. For each one I've written down my proposed check, as well as any questions and/or notes:
Metric Providers
cpu/energy/RAPL/MSR/component/provider.py:
Proposed check:
- check that /dev/cpu/0/msr exists and can be read
- 0 is chosen as it should always be there
Notes:
- check compatible cpu?
- source.c checks cpu info from from /proc/cpuinfo and has logic to compare.
- do we want to pull the logic out?
- docs
cpu/frequency/sysfs/core/provider.py:
Proposed check:
- Already Done
Notes:
- system check done ( tries to open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq and read)
- docs
cpu/time/cgroup/container/provider.py:
Proposed check:
- check paths if exist/can open: (%d is userid, %s is contianer_id string)
- (rootless) /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat
- (rootfull) /sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat
- else, see note from top re: alternate cgroup checks
Questions:
- can we use container_id for checks? if not, see notes below
Notes:
- userid can be gotten in python with os.getuid (double check this)
- container_id we pass in, but I think after check_system. if we can get container_id:
- check paths if exist/can open: (%d is userid, %s is contianer_id string)
- (rootless) /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat
- (rootfull) /sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat
- if we cannot get access to container_id during check_system...
- we can check that sys/fs/cgroup/user.slice/XXX (cpu.stat, memory.current, etc) exist?
- check paths if exist/can open: (%d is userid, %s is contianer_id string)
- docs
cpu/time/cgroup/system/provider.py:
Proposed check:
- find and open /sys/fs/cgroup/cpu.stat successfully
Notes:
cpu/time/procfs/system/provider.py:
Proposed check:
- None
Questions: is one needed here? see notes below
Notes:
- find and open /proc/stat - this file is all the provider looks at
- isn't this on every linux system? does this "check" even make sense?
- is there a better one / does this provider really need one?
- docs
cpu/utlization/cgroup/container/provider.py:
Proposed check:
- open/read the following:
- /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat"
- "/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat"
Questions:
- same cgroup questions as before
- same question re: /proc/stat as before
Notes: - docs
cpu/utilization/procfs/system/provider.py:
Proposed check:
- None?
Questions: See notes
Notes:
- same questions as from cpu/time/procfs/system/provider.py
- this provider basically just looks in /proc/stat
- does checking /proc/stat make sense or is there a better check to be done here?
- docs
lm_sensors/abstract_providers:
Proposed check:
- Check
sensors
output, compare with values in YML
Questions:
- check_systems should be in abstract_provider, correct?
- unsure if my proposed check is best idea - any others?
Notes:
memory/energy/RAPL/MSR/component/provider.py:
Proposed check:
-
if we have containerID/userID, then check paths if can read/open
- rootless: "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat" (userid/userid/containerid)
- rootfull: "/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat""/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat" (containerID)
-
Questions:
- cgroups again
Notes:
memory/total/cgroup/container/provider.py:
Proposed check:
- try to read/open files:
- "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/memory.current"
- "/sys/fs/cgroup/system.slice/docker-%s.scope/memory.current"
Questions:
- cgroups again
Notes:
network/connections/proxy/container/provider.py:
Proposed check:
- Already has system check
- (checks Tinyproxy is version 1.11 or greater)
Questions:
- (checks Tinyproxy is version 1.11 or greater)
- is this enough?
Notes: - docs
network/io/cgroup/container/provider.py:
Proposed check:
- "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cgroup.procs"
- "/sys/fs/cgroup/system.slice/docker-%s.scope/cgroup.procs"
Questions:
- cgroups
- also looks in /proc/%u/ns/net
- %u is container PID
- but this is a general linux thing, again. I think it makes sense to skip this one?
Notes:
network/io/docker/stats/container/provider.py:
Proposed check:
- simple check if "docker stats" returns values?
- this metric provider doesn't seem super supported?
Questions:
- no docs
- this metric provider is for testing purposes only, is that why?
Notes:
- no docs at all?
- "Start profiling Overloaded for docker_stats. This provider is only for testing!
Never use in production!"
- "Start profiling Overloaded for docker_stats. This provider is only for testing!
powermetrics/provider.py:
Proposed check:
- has system check already (pgrep -qx powermetrics, pass if success)
Questions:
- anything more need done?
- besides adding a section to the docs, I mean
Notes:
- no docs under metric providers, just a note under macos installation here
psu/energy/ac/gude/machine/provider.py:
Proposed check:
- try to load "http://192.168.178.32/status.json" and see if anything is there?
Questions:
- docs?
Notes:
- needs check
- "This script expects the GUDE Powermeter to be fixed on the IP 192.168.178.32"
- try to load "http://192.168.178.32/status.json" and see if anything is there?
- docs missing entirely?
psu/energy/ac/impi/machine/provider.py
Proposed check:
- try to run "sudo /usr/sbin/ipmi-dcmi" command?
- "sudo /usr/sbin/ipmi-dcmi --get-system-power-statistics" ?
- "sudo /usr/sbin/ipmi-dcmi --help" ?
- is there a better command?
Questions:
- I don't have ipmi on my system, not 100% sure what to check. suggestions?
- running sudo in systems_check ?
Notes:
- check based on "sudo /usr/sbin/ipmi-dcmi --get-system-power-statistics" command
- docs
psu/energy/ac/mcp/provider.py:
Proposed check:
- try to open "/dev/ttyACM0" ?
Questions:
- no readme info - add?
Notes:
- [docs] (https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-ac-mcp-machine/)
psu/energy/ac/powerspy2/machine/provider.py:
Proposed check:
- check if dev is readable: "/dev/rfcomm0"
Notes:
psu/energy/ac/sdia/machine/provider.py:
Proposed check:
- None;
OR - same checks from read_metrics (see notes below)
Notes:
- read_metrics makes a check at the beggining for cpu_util data,
- read_metrics also checks the config files are setup with TDP & cpu_chips
- I think that's all that's needed, move (duplicate?) these two checks to check_system ?
- [docs] (https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-sdia-system/)
psu/energy/ac/xgboost/machine/provider.py
Proposed check:
- None;
OR
cpu_util check from read_metrics (see below)
Notes:
- read_metrics has cpu_util check
- that should be all that's needed. do we want to move this to a check_systems as well?
- docs
info from phone call:
cgroups:
- won't have container's up
- check "base"
procs
- worth it to check (ex: might not be loaded on linux for windows or something)
models
- move checks from read_metrics into system_check
ipmi
- one check to see if its installed (run basic ipmi command)
- check output, if null, throw warning (install ipmi to get sample null output locally)
- throw warning (introduce new class RuntimeConfigurationWarning (based on the exceptions in base class) capture error and log in runner.py )
- warning should be last, of course
Unsupported / do not write checks for:
- network/io/docker/stats/container/provider.py
- psu/energy/ac/gude/machine/provider.py:
- write documentation for this one at least
psu checks (general)
- check if /dev/ file is present, do not try to open
- for empty readmes, link to appropriate documentation
draft documetnation for psu/energy/ac/gude/machine/provider.py:
- check for specific model
- blauer angel documentation reference
- not officially maintained anymore, backwards compatability
- please find technical specifictions on the manufactorer's website