Project-MONAI/monai-deploy-informatics-gateway

Investigation: Performance with uploading instances to MinIO

Closed this issue · 4 comments

Description

Currently in an environment we are seeing that saving a study to MinIO is taking around 1 second per slice. This ticket is to track the investigation of that. Unsure where the issue currently lies.

Steps to reproduce

  1. Deploy MIG and MinIO to an environment
  2. Send a study to benchmark

Expected behavior

Study is uploaded to storage within an acceptable amount of time

Actual behavior

Study taking > 10 mins in some cases to save

Please share the environment running IG & MinIO.

  • CPU cores
  • RAM
  • Disk size/speed
  • Network speed

Hi @mocsharp. Details of the env.

DGX box
1 gpu, 8 vCPU, 32GB ram, Up to 25 Gbps, 225 GB NVMe SSD
Both Head nodes
2 vCPU, 8 GB ram, Seems about 300Mbps

All boxes are attached to a EFS instance.
https://docs.aws.amazon.com/efs/latest/ug/performance.html

I ran 5 studies using MONAI Deploy Lite and each study was completed within 1-2 mins (upload took no longer than a minute each). The MIG container was set to use only 2 CPUs + 8GB ram.

Container stats after all 5 studies are completed.

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
148bd3484592   mdl-orthanc    1.74%     295.1MiB / 31.05GiB   0.93%     1.07MB / 964MB    1.22GB / 156MB    66
c35bf2a24db7   mdl-minio      0.03%     244.4MiB / 31.05GiB   0.77%     1.76GB / 1.04GB   4.2GB / 5.52GB    20
d7ab6113b075   mdl-rabbitmq   2.40%     154.9MiB / 31.05GiB   0.49%     915kB / 897kB     42.5MB / 3.71MB   45
46899c9fcd8a   mdl-mongodb    0.58%     116.9MiB / 31.05GiB   0.37%     162kB / 1.37MB    173MB / 12.2MB    39
ea62b3f26676   mdl-ig         0.63%     673MiB / 8GiB         8.21%     1.01GB / 980MB    21.1MB / 234MB    21
29f205cb59d5   mdl-tm         0.00%     156.6MiB / 31.05GiB   0.49%     974MB / 776MB     964MB / 90.1kB    21
3eadb0315746   mdl-wm         0.10%     79.47MiB / 31.05GiB   0.25%     23.6MB / 6.41MB   17.7MB / 0B       23

Added ability to switch to disk for storing incoming data before uploading to storage service in PR #166.

Time measured from the first instance is received to the time the workflow request is sent.

When 10 studies (588 instances per study) are sent to IG continuously, using the disk is much faster than memory:

# Memory
Workflow request published to md.workflow.request, message ID=2cff618e-d1a3-4115-ae80-e5e6b4f411b7. Payload took 00:02:25.1398701 to complete.
Workflow request published to md.workflow.request, message ID=a766b539-2a89-41f4-8b13-9ba27a3bac6d. Payload took 00:05:46.5375501 to complete.
Workflow request published to md.workflow.request, message ID=f407ac25-5b54-4e40-b97f-fac16e914874. Payload took 00:09:04.7963726 to complete.
Workflow request published to md.workflow.request, message ID=05c44068-b0e2-446c-b63a-fbedaf7585c4. Payload took 00:11:53.2644454 to complete.
Workflow request published to md.workflow.request, message ID=1a0cf1ec-2cdc-458e-b3ae-c2845cb1daee. Payload took 00:14:17.0859691 to complete.
Workflow request published to md.workflow.request, message ID=60dd9547-2482-4559-9ac5-50d4bd5bbef4. Payload took 00:16:23.0448137 to complete.
Workflow request published to md.workflow.request, message ID=ebedd87f-68ba-4161-9dbc-8304045c75d0. Payload took 00:18:01.6260370 to complete.
Workflow request published to md.workflow.request, message ID=0b8dd992-98de-4ea2-9c5a-9d941f3d0c78. Payload took 00:19:19.5292454 to complete.
Workflow request published to md.workflow.request, message ID=9b0038bd-dbd1-4e2b-9a1d-864f9c0c5c94. Payload took 00:20:02.9085077 to complete.
Workflow request published to md.workflow.request, message ID=d5d00669-c9d8-482a-8c71-a9b975f179d7. Payload took 00:20:11.4483895 to complete.


# Disk
Workflow request published to md.workflow.request, message ID=c09cf45a-9cb9-4ea5-8236-cbe77ff374e7. Payload took 00:00:51.2866232 to complete.
Workflow request published to md.workflow.request, message ID=2196c06f-54d6-401e-9263-869bf13f7741. Payload took 00:01:27.5621902 to complete.
Workflow request published to md.workflow.request, message ID=fda8e1ba-3f37-4a6e-a141-48e3b1fc0e84. Payload took 00:01:57.8332724 to complete.
Workflow request published to md.workflow.request, message ID=42a61e75-fe33-4be6-824b-5352f886a1cf. Payload took 00:02:37.3700811 to complete.
Workflow request published to md.workflow.request, message ID=d5511b17-3909-4034-82a8-5abc7ebc06fc. Payload took 00:03:19.2335252 to complete.
Workflow request published to md.workflow.request, message ID=d058139a-7b7d-4cf1-a832-e712a1767fa5. Payload took 00:04:23.0854715 to complete.
Workflow request published to md.workflow.request, message ID=651af33d-845f-491f-8fdc-a6f0a544d74a. Payload took 00:05:07.1626852 to complete.
Workflow request published to md.workflow.request, message ID=d1e7e27d-9b15-4580-978b-ba0067d08eef. Payload took 00:05:56.3368932 to complete.
Workflow request published to md.workflow.request, message ID=050ca6c7-5daf-4a86-a608-8af9fc748600. Payload took 00:06:26.4512770 to complete.
Workflow request published to md.workflow.request, message ID=63477934-7244-4a11-8c5b-681b125b0ebb. Payload took 00:07:03.9482112 to complete.

Similarly for send a single study:

# Memory
Workflow request published to md.workflow.request, message ID=382d5c48-48fd-44cd-b059-d19dd907f91a. Payload took 00:01:11.5303211 to complete.

# Disk
Workflow request published to md.workflow.request, message ID=3adc8dd6-2dcb-42a9-89b7-3ec021866f87. Payload took 00:00:46.1229609 to complete.