fsspec/s3fs

How to Increase async httpconnection limit?

ion-elgreco opened this issue · 7 comments

I want to increase the http connection limit to see if I can saturate my network more but I don't see a way on how to pass this through the FileSystem, I went through the code and aiobotocore as well but no luck yet. Increasing the max_connection_pool already helps a bit though which increases io by 2x.

Any suggestions on how to increase the concurrency?

There are many levers to pull, actually. How are you setting the pool, what kind of benchmark are you running, and do you have an idea of what your current bottleneck may be caused by? Since fsspec generally maintains its own IO thread/loop, a significant increase in performance is something I'd be happy to bake in.

@martindurant I am currently passing this to the S3FileSystem: config_kwargs={"max_pool_connections": 50},.

I was checking with iftop what peak transfer rate was, it was just 50Mb out of 1Gbps network capacity (aks -> LakeFS on aks -> azure blob). It took around 15secs to read 6000 txt files. I think it could go faster but not sure :)

Would you mind making a graph of max_pool versus throughput? How many files (~ coroutines) are in flight?

@martindurant do you have some examples on how to access these things during execution?

  • I thought throughput was exactly what you were already measuring
  • The number of files you should be able to get from a normal glob or expand_paths call.
  • You could maybe use callbacks to measure the coroutines, but probably you would need to hack something into maybe fsspec.asyn._runner

ping, since this just came up on another thread. @ion-elgreco , have you had a chance to do any more benchmarking or testing?

@martindurant hey, I parked improving it further since it worked "good enough"