nci/gsky

WCS issues of GSKY 1.1 at staging

Closed this issue · 3 comments

metyr commented

GSKY 1.1 at staging is able to process WCS requests with large length & resolution in a satisfied performance. However, some wcs requests with various length & resolutions randomly fail:

Length\Res 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792
2 1.11 0.89 0.99 1.04 1.07 1.17 1.27 2.29 2.95 3.83 4.04 4.62 5.08
4 2.09 1.92 1.94 2.22 SRE SRE SRE SRE SRE SRE SRE SRE SRE
6 SRE SRE SRE SRE SRE SRE SRE SRE SRE SRE SRE SRE 6.21
8 4.35 4.50 4.52 4.61 SRE SRE SRE SRE SRE SRE SRE SRE SRE
10 5.84 5.74 5.98 5.84 6.02 SRE SRE SRE SRE SRE SRE SRE SRE
12 SRE SRE SRE SRE SRE SRE SRE SRE SRE SRE SRE 10.65 10.64
14 8.77 8.90 8.78 9.09 8.79 9.08 9.33 12.68 SRE SRE SRE SRE SRE
18 13.57 13.69 13.61 13.52 14.14 14.10 14.16 17.45 17.68 16.87 17.91 17.81 17.86
20 16.59 16.41 16.25 SRE SRE SRE SRE SRE SRE SRE SRE 20.17 20.70
22 19.73 20.48 19.94 19.49 20.86 21.46 22.22 26.34 25.05 24.06 24.01 24.16 25.31
24 23.84 23.58 23.75 23.62 SRE SRE SRE SRE SRE SRE SRE 28.21 29.41
26 27.39 27.91 27.03 27.42 27.82 29.11 29.60 34.16 32.63 31.57 31.56 32.67 33.05
28 31.46 31.66 31.25 31.32 SRE SRE SRE SRE 38.30 36.03 36.81 37.86 36.96
30 32.98 33.52 33.58 33.67 34.55 35.61 PTO 40.44 38.16 39.48 38.74 39.52 39.62
32 35.10 35.85 34.64 35.66 36.35 PTO PTO 40.81 41.21 40.17 39.98 40.34 40.96
34 35.42 36.59 35.93 PTO PTO PTO PTO 42.10 42.35 40.57 44.86 42.65 42.50
36 36.49 36.67 36.03 36.92 PTO PTO PTO PTO 42.59 41.84 42.83 43.97 42.71
38 36.85 36.87 PTO PTO PTO PTO PTO PTO 42.28 42.66 42.54 42.61 43.58
40 36.34 PTO PTO PTO PTO PTO PTO PTO 42.94 44.00 44.53 44.57 44.10
42 36.70 PTO PTO PTO PTO PTO PTO PTO 43.04 43.87 42.34 44.29 43.78
44 37.13 36.15 PTO PTO PTO PTO PTO PTO PTO 45.00 42.33 42.80 44.05
46 36.82 PTO 36.94 PTO PTO PTO PTO PTO PTO 43.03 43.59 43.19 44.31
48 36.93 PTO 36.80 36.50 PTO PE SRE SRE SRE SRE SRE SRE SRE
50 PTO PTO PTO PTO PTO PTO PTO PTO PTO 41.76 43.71 42.91 46.35

Error message
SRE: Server resources exhausted
PTO: Processing timed out
PE: 502 Proxy Error

An example failure request is
http://gsky-test.nci.org.au/ows?WIDTH=1024&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C160.0%2C0.0&HEIGHT=1024

The raijin grpc cluster seems busy to make response to these requests.

metyr commented

The above issues have been solved by increasing timeout values to 180 seconds.

Some more issues are listed below with new test runs:

  1. Failed to reopen existing dataset: /tmp/raster_*******
    This happened on WCS request by saving data into NetCDF format rather than GeoTIFF format. The example requests are listed as below. It may just shows 'proxy error' as the response time approaches 180 seconds. This error message only happens for request with the resolutions > 17480.

2018-09-27 23:21:49,070 WARNING ======WCS REQUEST WARNING=================
2018-09-27 23:21:49,071 WARNING Failed to reopen existing dataset: /tmp/raster_100694687
Error writing raster band: 0, xOff: 6144, yOff:6144
2018-09-27 23:21:49,071 WARNING http://gsky-test.nci.org.au/ows?WIDTH=21504&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=NetCDF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C160.0%2C0.0&HEIGHT=21504
2018-09-27 23:21:49,071 WARNING Failed mode=center nscal=0 ,res=21504,area=[110.0, -50.0, 160.0, 0.0],time=166.336482 sec.

2018-09-27 23:24:40,230 WARNING ======WCS REQUEST WARNING=================
2018-09-27 23:24:40,230 WARNING Failed to reopen existing dataset: /tmp/raster_034023794
Error writing raster band: 0, xOff: 3072, yOff:4096
2018-09-27 23:24:40,230 WARNING http://gsky-test.nci.org.au/ows?WIDTH=20480&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=NetCDF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C160.0%2C0.0&HEIGHT=20480
2018-09-27 23:24:40,230 WARNING Failed mode=center nscal=0 ,res=20480,area=[110.0, -50.0, 160.0, 0.0],time=169.156354 sec.

2018-09-27 23:27:35,270 WARNING ======WCS REQUEST WARNING=================
2018-09-27 23:27:35,270 WARNING Failed to reopen existing dataset: /tmp/raster_239381545
2018-09-27 23:27:35,270 WARNING http://gsky-test.nci.org.au/ows?WIDTH=19456&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=NetCDF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C160.0%2C0.0&HEIGHT=19456
2018-09-27 23:27:35,271 WARNING Failed mode=center nscal=0 ,res=19456,area=[110.0, -50.0, 160.0, 0.0],time=173.038857 sec.

2018-09-27 23:30:35,515 WARNING ======WCS REQUEST WARNING=================
2018-09-27 23:30:35,516 WARNING Failed to reopen existing dataset: /tmp/raster_248144756
2018-09-27 23:30:35,516 WARNING http://gsky-test.nci.org.au/ows?WIDTH=18432&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=NetCDF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C160.0%2C0.0&HEIGHT=18432
2018-09-27 23:30:35,516 WARNING Failed mode=center nscal=0 ,res=18432,area=[110.0, -50.0, 160.0, 0.0],time=178.244045 sec

  1. /tmp: 'no space left on device'
    This happened in the concurrent request benchmarks. It seems the /tmp size is too small to host multiple WCS requests at the same time.

2018-09-27 20:17:59,803 WARNING REQUEST#55#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=153.75%2C-12.5%2C160.0%2C-6.25&HEIGHT=2048
2018-09-27 20:17:59,803 WARNING REQUEST#56#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_917093108: no space left on device

2018-09-27 20:17:59,803 WARNING REQUEST#56#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=110.0%2C-6.25%2C116.25%2C0.0&HEIGHT=2048
2018-09-27 20:17:59,804 WARNING REQUEST#57#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_222220227: no space left on device

2018-09-27 20:17:59,804 WARNING REQUEST#57#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=116.25%2C-6.25%2C122.5%2C0.0&HEIGHT=2048
2018-09-27 20:17:59,804 WARNING REQUEST#58#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_225407725: no space left on device

2018-09-27 20:17:59,804 WARNING REQUEST#58#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=122.5%2C-6.25%2C128.75%2C0.0&HEIGHT=2048
2018-09-27 20:17:59,804 WARNING REQUEST#59#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_349283176: no space left on device

2018-09-27 20:17:59,804 WARNING REQUEST#59#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=128.75%2C-6.25%2C135.0%2C0.0&HEIGHT=2048
2018-09-27 20:17:59,804 WARNING REQUEST#60#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_718446503: no space left on device

2018-09-27 20:17:59,804 WARNING REQUEST#60#URL#http://gsky-test.nci.org.au/ows?WIDTH=2048&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=1988-05-16T00%3A00%3A00.000Z&REQUEST=GetCoverage&bbox=135.0%2C-6.25%2C141.25%2C0.0&HEIGHT=2048
2018-09-27 20:17:59,804 WARNING REQUEST#61#MSG#EncodeGdalOpen() failed: failed to create raster temp file: open /tmp/raster_334282458: no space left on device

  1. Proxy Error
    This happened frequently for concurrent request benchmark with the concurrency > 16.
metyr commented

New issue for WCS benchmark:

loops over
for length in [30,34,38,42,46]:
for res in [1024,6144,11264,16384,21504]

GeoTiff&Center        
Res/Length 30 34 38 42
1024 37 39 38 40
6144 61 61 62 62
11264 109 103 -2 -2
16384 -1 178 159 137
21504 -1 -1 -1 -1
         
GeoTiff&Corner        
Res/Length 30 34 38 42
1024 16 24 35 40
6144 -2 -2 -2 56
11264 -2 63 -2 -2
16384 85 110 128 134
21504 136 176 -1 -1
         
NC4&Center        
Res/Length 30 34 38 42
1024 36 38 -2 41
6144 -2 -2 -2 -2
11264 124 -2 -2 -2
16384 -1 -2 -2 -2
21504 -1 -1 -2 -2
         
NC4&Corner        
Res/Length 30 34 38 42
1024 -2 -2 -2 -2
6144 -2 -2 -2 -2
11264 -2 -2 -2 -2
16384 -2 -2 -2 -2
21504 157 -2 -2 -2

-1:Proxy error
-2: Server resources exhausted

metyr commented

WCS benchmark results after above "Fixed goroutine leak #227":
GeoTiff&Centre: Unit=seconds; Error message ID: -1,-2,-3

Res/Length 32 34 36 38 40 42 44 46 48 50
2048 41 42 42 43 44 46 44 46 45 42
3072 44 45 47 48 49 47 49 53 47 48
4096 50 52 52 52 56 52 52 50 50 50
5120 56 55 55 57 59 57 58 56 54 53
6144 60 62 62 64 67 65 62 61 58 57
7168 72 70 69 71 70 70 64 62 61 59
8192 78 79 77 78 75 72 70 67 67 65
9216 87 95 86 84 81 78 77 73 72 72
10240 94 93 94 93 89 89 88 82 80 78
11264 105 103 106 100 96 94 96 92 83 78
12288 117 116 115 107 102 105 106 102 90 88
13312 133 130 125 119 117 117 115 106 95 91
14336 150 144 137 133 129 131 120 118 99 97
15360 168 158 150 147 139 143 127 115 109 108
16384 -1 180 167 160 152 150 137 126 118 110
17408 -1 -1 -1 177 167 160 144 133 124 119
18432 -1 -1 -1 -1 -1 167 162 142 135 129
19456 -1 -1 -1 -1 -1 -1 166 154 147 139
20480 -1 -1 -1 -1 -1 -1 -1 169 159 148
21504 -1 -1 -1 -1 -1 -1 -1 -1 169 158
22528 -1 -1 -1 -1 -3 -1 -1 -1 -1 176
23552 -1 -1 -1 -1 -3 -1 -1 -1 -1 -1
24576 -1 -1 -1 -3 -1 -1 -1 -1 -2 -1
25600 -1 -1 -3 -1 -1 -1 -1 -1 -1 -1

GeoTiff&Corner:

Row Labels 32 34 36 38 40 42 44 46 48 50
2048 21 27 32 37 41 42 45 43 43 43
3072 23 29 35 40 46 46 46 46 47 49
4096 26 32 38 44 48 50 51 52 50 51
5120 28 33 41 46 51 54 54 55 55 54
6144 32 38 47 51 55 56 58 57 57 58
7168 35 42 50 58 62 63 -2 63 61 60
8192 41 46 54 60 67 71 71 69 67 65
9216 46 51 63 67 71 74 75 73 73 71
10240 49 56 66 72 81 82 78 78 76 77
11264 56 62 72 81 84 88 89 -2 83 80
12288 60 70 79 86 95 95 92 91 92 90
13312 66 78 87 94 101 100 104 99 94 93
14336 76 85 97 103 110 112 109 108 104 99
15360 87 96 107 115 121 122 -2 115 107 107
16384 95 107 117 124 133 133 129 124 117 111
17408 104 118 132 140 144 144 -2 131 128 118
18432 119 130 144 156 157 156 149 142 138 131
19456 133 149 157 168 175 172 162 155 146 141
20480 146 159 176 -1 -1 -1 179 169 158 151
21504 154 177 -1 -1 -1 -1 -1 -1 173 160
22528 169 -1 -1 -1 -1 -1 -1 -1 -1 174
23552 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
24576 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
25600 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

Error:

-1:Proxy Error.

This could be regarded as normal due to the time out.

-2: rpc error: code = Internal desc = unexpected EOF

This error occasionally happens, unreproducible.
example case:
2018-10-28 03:26:09,645 WARNING rpc error: code = Internal desc = unexpected EOF

2018-10-28 03:26:09,645 WARNING http://gsky-test.nci.org.au/ows?WIDTH=11264&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=110.0%2C-50.0%2C156.0%2C-4.0&HEIGHT=11264
2018-10-28 03:26:09,645 WARNING Failed mode=corner nscal=2 ,res=11264,area=[110.0, -50.0, 156.0, -4.0],time=55.846614 sec.

-3: WCS pipeline timed out

This takes 180 seconds so might also associate to the time out settings.
example case:
2018-10-27 21:21:01,899 WARNING ======WCS REQUEST WARNING=================
2018-10-27 21:21:01,936 WARNING WCS pipeline timed out

2018-10-27 21:21:01,936 WARNING http://gsky-test.nci.org.au/ows?WIDTH=23552&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=115.0%2C-45.0%2C155.0%2C-5.0&HEIGHT=23552
2018-10-27 21:21:01,936 WARNING Failed mode=center nscal=5 ,res=23552,area=[115.0, -45.0, 155.0, -5.0],time=180.102923 sec.
2018-10-27 21:24:03,981 WARNING ======WCS REQUEST WARNING=================
2018-10-27 21:24:03,981 WARNING WCS pipeline timed out

2018-10-27 21:24:03,981 WARNING http://gsky-test.nci.org.au/ows?WIDTH=22528&VERSION=1.0.0&Coverage=LS5%3ANBAR%3ATRUE&SERVICE=WCS&FORMAT=GeoTIFF&crs=EPSG%3A4326&time=40&REQUEST=GetCoverage&bbox=115.0%2C-45.0%2C155.0%2C-5.0&HEIGHT=22528
2018-10-27 21:24:03,981 WARNING Failed mode=center nscal=5 ,res=22528,area=[115.0, -45.0, 155.0, -5.0],time=180.037185 sec.

NetCDF&Centre:

Res/Length 32 34 36 38 40 42 44 46 48 50
2048 41 42 43 44 44 44 44 44 44 44
3072 47 47 49 47 48 49 50 49 49 49
4096 53 54 54 54 56 52 52 53 52 52
5120 57 56 57 57 59 58 58 59 56 57
6144 63 64 66 66 65 65 65 64 61 60
7168 71 74 72 72 71 70 67 66 63 61
8192 83 84 82 81 78 75 71 74 70 68
9216 92 92 90 88 83 80 79 84 76 75
10240 99 98 98 94 -2 93 87 152 83 80
11264 112 110 108 106 101 97 93 -1 88 84
12288 128 123 119 111 106 107 102 -1 96 92
13312 143 138 130 122 119 116 112 -1 102 98
14336 162 153 142 135 130 125 115 -1 107 105
15360 -1 172 159 150 143 134 126 -1 119 115
16384 -1 -1 178 166 157 146 -2 -1 128 119
17408 -1 -1 -1 -1 174 160 151 -1 136 128
18432 -1 -1 -1 -1 -1 174 166 -1 146 140
19456 -1 -1 -1 -1 -1 -1 177 -1 159 151
20480 -1 -1 -1 -1 -1 -1 -1 -1 173 160
21504 -1 -1 -1 -1 -1 -1 -1 -1 -1 173
22528 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
23552 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
24576 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
25600 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

NetCDF&Corner:

Res/Length 32 34 36 38 40 42 44 46 48 50
2048 24 27 33 39 43 44 46 45 44 45
3072 25 31 36 43 46 47 49 48 48 49
4096 28 34 39 46 50 53 53 55 54 52
5120 31 37 42 49 53 57 57 56 56 56
6144 34 41 47 56 58 60 62 61 60 -2
7168 37 43 53 61 65 67 70 66 64 61
8192 45 50 57 66 71 75 76 73 69 69
9216 48 54 66 72 76 81 83 78 76 75
10240 54 62 70 79 -2 89 86 82 83 80
11264 60 67 77 87 90 98 96 91 86 86
12288 65 76 85 93 101 105 101 98 98 92
13312 74 85 95 102 110 115 114 106 99 99
14336 83 92 106 110 120 122 122 114 110 105
15360 96 103 115 123 128 135 125 121 116 116
16384 107 117 128 137 146 146 139 132 125 120
17408 113 129 143 150 154 160 150 141 136 128
18432 130 140 156 167 173 173 163 154 146 142
19456 146 162 172 -1 -1 -1 175 165 159 150
20480 158 175 -1 -1 -1 -1 -1 -1 171 163
21504 172 -1 -1 -1 -1 -1 -1 -1 -1 177
22528 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
23552 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
24576 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
25600 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

The performance between NetCDF and GeoTiff are not exactly consistent, probably because the server utilisations are different and thus spend different processing time even on the same Res/Length request.