Sage-Bionetworks/synapsePythonClient

Let the user specify the number of allowed threads

moskalenko opened this issue · 1 comments

Operating system

Any

Client version

2.4.0

Description of the problem

synapseclient spawns too many computational threads.

Relevant lines of the code
synapseclient/client.py:from synapseclient.core.pool_provider import DEFAULT_NUM_THREADS
synapseclient/client.py: 'max_threads': DEFAULT_NUM_THREADS,
synapseclient/core/upload/multipart_upload.py: max_threads = pool_provider.DEFAULT_NUM_THREADS
synapseclient/core/pool_provider.py:DEFAULT_NUM_THREADS = multiprocessing.cpu_count() + 4

cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.

Expected behavior

A synapseclient.Synapse attribute to set the number of threads and allowing the pool_provider to read an environment variable to set the number of threads would help with this issue.

Actual behavior

cpu_count() + 4 can lead to time slicing with hundreds of threads on a cluster compute node even if the code is running in an environment with a single CPU core available to it. As a result most threads are blocked or run on a fraction of a percent of a CPU core.

Thanks for reporting this @moskalenko . I think currently you can do something like this:

import synapseclient
synapseclient.client.DEFAULT_NUM_THREADS  = 4

Tagging @jkiang13 to confirm