green-coding-solutions/green-metrics-tool

Metrics reporters needs checks before they start

dan-mm opened this issue · 2 comments

dan-mm commented
  1. Please check current metric providers and how they implement already these starting checkes

Example: https://github.com/green-coding-berlin/green-metrics-tool/blob/main/metric_providers/cpu/frequency/sysfs/core/provider.py

  1. Please create a concept / plan which metric reporters still need checks and which checks you plan to integrate and also how.

  2. I want documentation for the providers what NEEDS to be installed to make them work inside of the page itself.

Example: Not like this: https://docs.green-coding.berlin/docs/installation/installation-linux/#xgboost
But it shall be here: https://docs.green-coding.berlin/docs/measuring/metric-providers/psu-energy-xgboost-system/

dan-mm commented

I went through all the metric providers and how they worked. For each one I've written down my proposed check, as well as any questions and/or notes:

Metric Providers

cpu/energy/RAPL/MSR/component/provider.py:

Proposed check:

  • check that /dev/cpu/0/msr exists and can be read
    • 0 is chosen as it should always be there

Notes:

  • check compatible cpu?
    • source.c checks cpu info from from /proc/cpuinfo and has logic to compare.
    • do we want to pull the logic out?
  • docs

cpu/frequency/sysfs/core/provider.py:

Proposed check:

  • Already Done

Notes:

  • system check done ( tries to open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq and read)
  • docs

cpu/time/cgroup/container/provider.py:

Proposed check:

  • check paths if exist/can open: (%d is userid, %s is contianer_id string)
    • (rootless) /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat
    • (rootfull) /sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat
  • else, see note from top re: alternate cgroup checks

Questions:

  • can we use container_id for checks? if not, see notes below

Notes:

  • userid can be gotten in python with os.getuid (double check this)
  • container_id we pass in, but I think after check_system. if we can get container_id:
    • check paths if exist/can open: (%d is userid, %s is contianer_id string)
      • (rootless) /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat
      • (rootfull) /sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat
    • if we cannot get access to container_id during check_system...
      • we can check that sys/fs/cgroup/user.slice/XXX (cpu.stat, memory.current, etc) exist?
  • docs

cpu/time/cgroup/system/provider.py:

Proposed check:

  • find and open /sys/fs/cgroup/cpu.stat successfully

Notes:

cpu/time/procfs/system/provider.py:

Proposed check:

  • None

Questions: is one needed here? see notes below

Notes:

  • find and open /proc/stat - this file is all the provider looks at
    • isn't this on every linux system? does this "check" even make sense?
    • is there a better one / does this provider really need one?
  • docs

cpu/utlization/cgroup/container/provider.py:

Proposed check:

  • open/read the following:
    • /sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat"
    • "/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat"
      Questions:
  • same cgroup questions as before
  • same question re: /proc/stat as before
    Notes:
  • docs

cpu/utilization/procfs/system/provider.py:

Proposed check:

  • None?

Questions: See notes

Notes:

  • same questions as from cpu/time/procfs/system/provider.py
    • this provider basically just looks in /proc/stat
    • does checking /proc/stat make sense or is there a better check to be done here?
  • docs

lm_sensors/abstract_providers:

Proposed check:

  • Check sensors output, compare with values in YML

Questions:

  • check_systems should be in abstract_provider, correct?
  • unsure if my proposed check is best idea - any others?

Notes:

memory/energy/RAPL/MSR/component/provider.py:

Proposed check:

  • if we have containerID/userID, then check paths if can read/open

    • rootless: "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cpu.stat" (userid/userid/containerid)
    • rootfull: "/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat""/sys/fs/cgroup/system.slice/docker-%s.scope/cpu.stat" (containerID)
  • Questions:

    • cgroups again

Notes:

memory/total/cgroup/container/provider.py:

Proposed check:

  • try to read/open files:
    • "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/memory.current"
    • "/sys/fs/cgroup/system.slice/docker-%s.scope/memory.current"

Questions:

  • cgroups again

Notes:

network/connections/proxy/container/provider.py:

Proposed check:

  • Already has system check
    • (checks Tinyproxy is version 1.11 or greater)
      Questions:
  • is this enough?
    Notes:
  • docs

network/io/cgroup/container/provider.py:

Proposed check:

  • "/sys/fs/cgroup/user.slice/user-%d.slice/user@%d.service/user.slice/docker-%s.scope/cgroup.procs"
  • "/sys/fs/cgroup/system.slice/docker-%s.scope/cgroup.procs"

Questions:

  • cgroups
  • also looks in /proc/%u/ns/net
    • %u is container PID
    • but this is a general linux thing, again. I think it makes sense to skip this one?

Notes:

network/io/docker/stats/container/provider.py:

Proposed check:

  • simple check if "docker stats" returns values?
    • this metric provider doesn't seem super supported?

Questions:

  • no docs
    • this metric provider is for testing purposes only, is that why?

Notes:

  • no docs at all?
    • "Start profiling Overloaded for docker_stats. This provider is only for testing!
      Never use in production!"

powermetrics/provider.py:

Proposed check:

  • has system check already (pgrep -qx powermetrics, pass if success)

Questions:

  • anything more need done?
    • besides adding a section to the docs, I mean

Notes:

  • no docs under metric providers, just a note under macos installation here

psu/energy/ac/gude/machine/provider.py:

Proposed check:

Questions:

  • docs?

Notes:

  • needs check
  • docs missing entirely?

psu/energy/ac/impi/machine/provider.py

Proposed check:

  • try to run "sudo /usr/sbin/ipmi-dcmi" command?
    • "sudo /usr/sbin/ipmi-dcmi --get-system-power-statistics" ?
    • "sudo /usr/sbin/ipmi-dcmi --help" ?
    • is there a better command?

Questions:

  • I don't have ipmi on my system, not 100% sure what to check. suggestions?
  • running sudo in systems_check ?

Notes:

  • check based on "sudo /usr/sbin/ipmi-dcmi --get-system-power-statistics" command
  • docs

psu/energy/ac/mcp/provider.py:

Proposed check:

  • try to open "/dev/ttyACM0" ?

Questions:

  • no readme info - add?

Notes:

psu/energy/ac/powerspy2/machine/provider.py:

Proposed check:

  • check if dev is readable: "/dev/rfcomm0"

Notes:

psu/energy/ac/sdia/machine/provider.py:

Proposed check:

  • None;
    OR
  • same checks from read_metrics (see notes below)

Notes:

psu/energy/ac/xgboost/machine/provider.py

Proposed check:

  • None;
    OR
    cpu_util check from read_metrics (see below)

Notes:

  • read_metrics has cpu_util check
    • that should be all that's needed. do we want to move this to a check_systems as well?
  • docs
dan-mm commented

info from phone call:

cgroups:

  • won't have container's up
  • check "base"

procs

  • worth it to check (ex: might not be loaded on linux for windows or something)

models

  • move checks from read_metrics into system_check

ipmi

  • one check to see if its installed (run basic ipmi command)
  • check output, if null, throw warning (install ipmi to get sample null output locally)
    • throw warning (introduce new class RuntimeConfigurationWarning (based on the exceptions in base class) capture error and log in runner.py )
    • warning should be last, of course

Unsupported / do not write checks for:

  • network/io/docker/stats/container/provider.py
  • psu/energy/ac/gude/machine/provider.py:
    • write documentation for this one at least

psu checks (general)

  • check if /dev/ file is present, do not try to open
  • for empty readmes, link to appropriate documentation

draft documetnation for psu/energy/ac/gude/machine/provider.py:
- check for specific model
- blauer angel documentation reference
- not officially maintained anymore, backwards compatability
- please find technical specifictions on the manufactorer's website