awslabs/mountpoint-s3

Slow throughput when reading many small files

Opened this issue · 6 comments

Mountpoint for Amazon S3 version

mount-s3 1.7.2

AWS Region

us-west-2

Describe the running environment

Running in p4de EC2 instance on Amazon Linux 2 using instance profile credentials against an S3 Bucket in the same account.

Mountpoint options

mount-s3 \                                                                                                                                                                                                
  --transfer-acceleration \                                                                                                                                                                               
  --read-only \                                                                                                                                                                                           
  --max-threads 2048 \                                                                                                                                                                                    
  --cache /media/ramdisk/cache \                                                                                                                                                                          
  --prefix <PREFIX> \                                                                                                                                                               
  <BUCKET> /media/ramdisk/data

What happened?

I am trying to read 16k small 8MB files in parallel from S3. I have been comparing Mountpoint-S3 and goofys. I am seeing a large difference in performance when using Mountpoint. With goofys I am able to read all the files in 44s, with Mountpoint it takes 478s. These timings are averaged over 5 test runs. Both goofys and Mountpoint are mounted to tmpfs file systems.

Relevant log output

No response

I think Mountpoint may not be correctly configuring the default network throughput on p4de instances. Could you try the argument --maximum-throughput-gbps 100 and see if that helps? For small-ish files like yours, setting it even higher (say to 400) might help more, but 100 would be a good place to start (it's what we configure for p4d).

Also, since you're going from EC2 to S3, --transfer-acceleration probably isn't needed — Transfer Acceleration is S3's feature for clients outside EC2 to route their traffic onto the Amazon network from as close to the customer as possible.

Thanks for the reply. I tried your suggestions, with max throughput set to 400 I am seeing mixed results shown below.

image

Here is the modified mount command with some extra context on my environment

sudo mount -t tmpfs -o size=140G tmpfs /media/ramdisk
sudo mkdir -p /media/ramdisk/input
sudo mkdir -p /media/ramdisk/cache

echo "Starting Mountpoint-S3..."
mount-s3 \
  --maximum-throughput-gbps 400 \
  --read-only \
  --max-threads 2048 \
  --cache /media/ramdisk/cache \
  --prefix $S3_PREFIX \
  $S3_BUCKET /media/ramdisk/input

Also here is a minimally reproducible example I used to generate the performance numbers

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        println!("Usage: main <INPUT_DIR>");
        return;
    }
    println!("Reading .bin files from {}", &args[1]);
    let dir_path = Path::new(&args[1]);
    let files = fs::read_dir(dir_path).unwrap();

    let mut handles = vec![];
    let total_bytes = Arc::new(Mutex::new(0));

    let start_time = Instant::now();

    for file in files {
        let file = file.unwrap();
        let file_path = file.path();
        if file_path.extension().unwrap_or_default() == "bin" {
            let total_bytes_clone = total_bytes.clone();
            let handle = thread::spawn(move || {
                let mut file = fs::File::open(file_path).unwrap();
                let mut contents = vec![];
                let bytes_read = file.read_to_end(&mut contents).unwrap();
                *total_bytes_clone.lock().unwrap() += bytes_read;
            });
            handles.push(handle);
        }
    }
    println!("Spawned {} threads", handles.len());

    for handle in handles {
        handle.join().unwrap();
    }

    let elapsed_time = start_time.elapsed();
    let total_gb = *total_bytes.lock().unwrap() as f64 / (1024_f64 * 1024_f64 * 1024_f64);
    let throughput_gb_s = total_gb / elapsed_time.as_secs_f64();
    println!("Overall throughput: {:.2} GB/s", throughput_gb_s);
}
arsh commented

@pkasravi I noticed that you configured Mountpoint to cache the file's content locally. In my tests, I observed an improvement when caching was disabled, which seems more in line with how Goofys operates. It might be good to compare the performance by disabling caching, to ensure we're comparing apples to apples.

Hi @arsh I tried your suggestion but I'm not seeing much of a difference. Here are the results using the same code I shared above. I've also included the results from goofys (again same code) for comparison

image

Goofys command:

goofys \                                                                                                                           
  $S3_BUCKET:$S3_PREFIX \                                                                                                                                     
  /media/ramdisk/input
arsh commented

I'm going to read 16,000 8MB files to observe the performance I get and will report back. Previously, I tested with 5,000 files and noticed some improvement by disabling caching.

Could you run your test with caching disabled and logging enabled in MP? Please share the log file afterward.

You can do this by running Mountpoint as follows:

MOUNTPOINT_LOG=trace,awscrt=error \
mount-s3 \
--read-only \
--max-threads 2048 \
--maximum-throughput-gbps 400 \
--log-directory <a local directory>
--prefix <PREFIX> \
<BUCKET> /media/ramdisk/data

More details on logging are here https://github.com/awslabs/mountpoint-s3/blob/main/doc/LOGGING.md#logging-to-a-file

@arsh were you able to reproduce similar or different results?

I ran my test with logging enabled, the log file was 1G. I've attached as much as github will allow me, let me know if it's useful to add the rest.

log-parts.zip