posix: aws_system_info_processor_count over-estimates the number of CPUs on k8s / in container
grrtrr opened this issue · 3 comments
The posix
aws_system_info_processor_count()
function returns the count of physical processors via sysconf(_SC_NPROCESSORS_ONLN)
where sysconf
is available (on many Linux systems).
Within a container, k8s pod, or any other resource constrained environment (cpusets, cgroups), this causes the
number of available CPUs to be over-estimated.
For instance, on a k8s job with 4 requested CPUs on a 64-core node, the reported value is 64, not 4.
To avoid this over-estimation, additional queries are necessary, looking at for
- CPU shares,
- CPU period/quota (CFS values),
- CPU sets.
Here is a good write-up on how this is estimated in the JDK, producing this formula that takes the above into account:
total_available_cpus = min(cpuset_cpus, cpu_shares/1024, physical_number_of_cpus)
I'm unsure this rises to the level of an actual issue, and if it did it would belong in aws-c-io relative to the event_loop_group_new_default usage of the API. Personally, I feel like no one knows the destination execution environment better than the developer, and so when the brain-dead default isn't appropriate (like the cases you list above), it may be best to defer pool size calculations to the library consumer.
On a side note, I'm not ignoring all of your aws-c-* PRs. Many of them are excellent diagnoses and updates; I just haven't had a chance time/priority-wise to massage/integrate/feedback them.
You should be able to set the pool size on you end as bretambrose discussed above. No changes needed to this repo, closing this issue
It might be a good idea to warn the developer about the limitations of defaults. std::thread::hardware_concurrency()
has the same problem (it looks as if it uses sysconf
underneath).