Towards Distributed WCS
Closed this issue · 0 comments
The current WCS functionality does not scale beyond single ows
node. The clients who use WCS often want high or even full resolution data. This can result in very large output images beyond the memory limit of single node. Therefore, we need to build facility that coordinates a cluster of ows
nodes to compute large WCS requests. One variable implementation is as follows:
- On the master
ows
node, we split the requested bounding box into a series of smaller bounding boxes. Doing so will both reduce memory pressure and boost performance due to concurrent processing among the splits on the gRPC nodes. - We then distribute portions of the splits to the worker
ows
nodes. - The master
ows
node waits for the completion of worker nodes and merge the results.
The following experimental results demonstrate the effectiveness of the above proposed strategy. The environment of this experiment is a single node with 8 physical cores and 16GB memory. The ows
cluster consists of 4 ows
processes running on the same node. The same node also has 8 gRPC workers. Our baseline ows
is also built in this environment to ensure all the experimental conditions are the same in order have a fair comparison. The test data is Geoglam monthly fractional cover. The WCS request is as follows:
http://<gsky server ip>/ows/?SERVICE=WCS&service=WCS&crs=EPSG:4326&format=geotiff&request=GetCoverage&height=<height>&width=<width>&version=1.0.0&bbox=-179,-80,180,80&coverage=global:c6:monthly_frac_cover&time=2018-01-01T00:00:00.000Z
The bounding box -179, -80, 180, 80
virtually covers the entire planet. The resultant number of data files for such a bounding box is 816.
The following table shows the processing time between the baseline and the proposed solution. Processing time is the round-trip time of the WCS request.
Wdith | Height | Baseline (secs) | Proposed (secs) |
---|---|---|---|
2000 | 2000 | 12.191 | 10.822 |
5000 | 5000 | 44.216 | 13.364 |
10000 | 10000 | OOM | 19.650 |
20000 | 20000 | OOM | 38.755 |
40000 | 40000 | OOM | 104.930 |
80000 | 80000 | OOM | 336.439 |
121717 | 54247 | OOM | 345.774 |
Note:
- OOM stands for out of memory.
- Given
bbox
, 12717x54247 is the full resolution of the dataset i.e. the theoretical maximum width and height.