awslabs/aws-c-common

posix: aws_system_info_processor_count over-estimates the number of CPUs on k8s / in container

grrtrr opened this issue · 3 comments

The posix aws_system_info_processor_count() function returns the count of physical processors via sysconf(_SC_NPROCESSORS_ONLN) where sysconf is available (on many Linux systems).

Within a container, k8s pod, or any other resource constrained environment (cpusets, cgroups), this causes the
number of available CPUs to be over-estimated.

For instance, on a k8s job with 4 requested CPUs on a 64-core node, the reported value is 64, not 4.

To avoid this over-estimation, additional queries are necessary, looking at for

  • CPU shares,
  • CPU period/quota (CFS values),
  • CPU sets.

Here is a good write-up on how this is estimated in the JDK, producing this formula that takes the above into account:

  total_available_cpus = min(cpuset_cpus, cpu_shares/1024, physical_number_of_cpus)

I'm unsure this rises to the level of an actual issue, and if it did it would belong in aws-c-io relative to the event_loop_group_new_default usage of the API. Personally, I feel like no one knows the destination execution environment better than the developer, and so when the brain-dead default isn't appropriate (like the cases you list above), it may be best to defer pool size calculations to the library consumer.

On a side note, I'm not ignoring all of your aws-c-* PRs. Many of them are excellent diagnoses and updates; I just haven't had a chance time/priority-wise to massage/integrate/feedback them.

jmklix commented

You should be able to set the pool size on you end as bretambrose discussed above. No changes needed to this repo, closing this issue

grrtrr commented

It might be a good idea to warn the developer about the limitations of defaults. std::thread::hardware_concurrency() has the same problem (it looks as if it uses sysconf underneath).