Do we understand the performance of UDA?
Opened this issue · 3 comments
I have been looking at the performance of UDA in getting an entire IDS -- in this case the summary IDS.
I have been using uda.py (https://gitlab.eufus.psnc.pl/data-access-tools/uda-test) for these tests.
The data resides at ITER and I have accessed it (1) directly from ITER using the UDA bypass feature, (2) using UDA at ITER talking to the ITER UDA server, and (3) using UDA on the EUROfusion Gateway talking to the ITER UDA server. The results are summarized below:
In all cases the command executed is
./uda.py -u 'imas://uda.iter.org:56565/uda?path=/work/imas/shared/imasdb/ITER/3/134174/117&backend=mdsplus' -c summary
At ITER
With bypass enabled ($IMAS_LOCAL_HOSTS == uda.iter.org)
DBentry = 1.012 DBentry = 0.967 DBentry = 0.972
get = 0.488 get = 0.125 get = 0.135
close = 0.002 close = 0.001 close = 0.002
With bypass disabled ($IMAS_LOCAL_HOSTS == '')
DBentry = 1.421 DBentry = 1.325 DBentry = 1.370
get = 94.286 get = 83.453 get = 80.465
close = 0.005 close = 0.004 close = 0.003
At Gateway
DBentry = 1.345 DBentry = 1.384 DBentry = 1.271
get = 96.223 get = 95.684 get = 100.322
close = 0.038 close = 0.036 close = 0.037
Concentrating on the data "get" operation, we see:
- Direct access is more than 300 times faster
- Remote access occurs a penalty of about 13% compared to local UDA access
Do we understand why accessing the data over the network is so much slower?
I also looked at a bigger IDS, and saw
./uda.py -u 'imas://uda.iter.org:56565/uda?path=/work/imas/shared/imasdb/ITER/3/134174/117&backend=mdsplus' -c equilibrium
From ITER
with $IMAS_LOCAL_HOSTS == uda.iter.org
DBentry = 1.029
get = 0.794
close = 0.000
with $IMAS_LOCAL_HOSTS == ''
DBentry = 1.591
get = 16284.980
close = 0.003
From Gateway
Timing information
DBentry = 2.039
get = 24115.763
close = 0.036
Which gives:
- UDA (local) / local : 20510.05
- UDA (Gateway) / UDA (local) : 1.481
Out of curiosity, checking the same in my local IMAS/UDA server. Difference is about 10 times.
yildiz@spcimas: ~ $ ./uda.py -u 'imas:mdsplus?path=/data/imas/public/imasdb/tcv/3/71222/999/' -c summary 2>/dev/null |grep "get ="
get = 0.059
yildiz@spcimas: ~ $ IMAS_LOCAL_HOSTS="localhost" ./uda.py -u 'imas://localhost:56565/uda?path=/data/imas/public/imasdb/tcv/3/71222/999/&backend=mdsplus' -c summary 2>/dev/null |grep "get ="
get = 0.059
yildiz@spcimas: ~ $ ./uda.py -u 'imas://localhost:56565/uda?path=/data/imas/public/imasdb/tcv/3/71222/999/&backend=mdsplus' -c summary 2>/dev/null |grep "get ="
get = 0.487
software versions:
yildiz@spcimas: ~ $ module list
Currently Loaded Modulefiles:
1) IMAS/3.39.0-5.0.0 2) idstools/1.14.1 3) uda/2.7.4
Info for the ones that haven't checked what IMAS_LOCAL_HOSTS does: it removes uda backend and sets the backend from the uda arguments. That's why the first and second calls are in fact identical. (access-layer/lowlevel/al_context.cpp#192)
@cenkoloji -- How big were the IDSes you accessed?
In my case
Reading 48.138 MB of data for equilibrium/0 took 2.01 seconds
Reading 0.061 MB of data for summary/0 took 0.14 seconds
using the "idssize" command of the soon to be released idstools (release/2.0.0)
This example was just summary. I can't run idssize as the imas version I have isn't compatible yet. but summary and equilibrium IDS hdf5 files are around
summary: 0.16Mb
equilibrium: 20Mb
Repeating same exercise with both backend, using equiliibrium (I couldn't get mds backend to work via UDA)
yildiz@spcimas: ~ $ IMAS_LOCAL_HOSTS="localhost" ./uda.py -u 'imas://localhost:56565/uda?path=/tmp/yildiz/imasdb/tcv/3/71222/999/&backend=hdf5' -c equilibrium 2>/dev/null |grep "get ="
get = 8.013
yildiz@spcimas: ~ $ ./uda.py -u 'imas://localhost:56565/uda?path=/tmp/yildiz/imasdb/tcv/3/71222/999/&backend=hdf5' -c equilibrium 2>/dev/null |grep "get ="
get = 33.998
yildiz@spcimas: ~ $ IMAS_LOCAL_HOSTS="localhost" ./uda.py -u 'imas://localhost:56565/uda?path=/tmp/yildiz/imasdb/tcv/3/71222/999/&backend=mdsplus' -c equilibrium 2>/dev/null |grep "get ="
get = 9.584