SUSE/phoebe

collect_stats tries to access sysfs

dcermak opened this issue · 5 comments

The collect_stats.py script tries to query the current cpu frequency and governor from sysfs:

with open(SYSFS_CPU_PATH + 'cpu0/cpufreq/scaling_governor') as f:

Unfortunately, this fails in the github actions with:

Traceback (most recent call last):
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 306, in <module>
    main(sys.argv[1], settings, count)
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 271, in main
    collect_stats(
  File "/__w/phoebe/phoebe/scripts/collect_stats.py", line 187, in collect_stats
    with open(SYSFS_CPU_PATH + 'cpu0/cpufreq/scaling_governor') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'

I suspect that this is caused by the github actions runner not allowing the CI action to query these information to prevent it from modifying the CPU behavior.

@shunghsiyu I was told you introduced this, can we remove it for the meantime?

Yes, but I suspect removing just scaling_governor may not be enough.

I think the main problem is the different kernel seems to expose different sysfs files/directories inside container. (Correct me if I'm wrong here)

Previously I use a work-around where I've maintain a set of sysfs entries SYSCTL_NOT_IN_CONTAINER that I know is not presented in our previous CI runner's environment (on GitLab). Entries in SYSCTL_NOT_IN_CONTAINER are added (painstakingly) through trail-by-errors, until it finally runs.

That wasn't a great work-around any way.

I think a better way forward is perhaps to detect that we're inside a container, and be more relaxed about missing sysfs entries if the script is running inside; using a value of 0 instead (or some other value, TBD). @mvarlese what do you think?

Frankly speaking, I don't see Phoebe being deployed into a container so I am not sure that running the .py script within a container and consider its results (whether pass or fail) pays off.

Okay, then let's remove it for the mean time, I'll open a PR.

Can't this issue be closed now?

I think so, the script now runs on the CI.