aerobulk-python is SLOWWW
jbusecke opened this issue · 4 comments
I am currently struggling with what I perceive as very slow performance with aerobulk-python.
For some of the work we are doing over at https://github.com/ocean-transport/scale-aware-air-sea, it seems that a single timestep of CM2.6 data takes about 45-50s to execute.
This does not seem to be a particularity of the data wer are using, since I am getting similar times for synthetic data:
from typing import Dict, Tuple
import time
import numpy as np
import pytest
import xarray as xr
from aerobulk import noskin
"""Tests for the xarray wrapper"""
def create_data(
shape: Tuple[int, ...],
chunks: Dict[str, int] = {},
skin_correction: bool = False,
order: str = "F",
):
def _arr(value, chunks, order):
arr = xr.DataArray(np.full(shape, value, order=order))
# adds random noise scaled by a percentage of the value
randomize_factor = 0.001
randomize_range = value * randomize_factor
arr = arr + np.random.rand(*shape) + randomize_range
if chunks:
arr = arr.chunk(chunks)
return arr
sst = _arr(290.0, chunks, order)
t_zt = _arr(280.0, chunks, order)
hum_zt = _arr(0.001, chunks, order)
u_zu = _arr(1.0, chunks, order)
v_zu = _arr(-1.0, chunks, order)
slp = _arr(101000.0, chunks, order)
rad_sw = _arr(0.000001, chunks, order)
rad_lw = _arr(350, chunks, order)
if skin_correction:
return sst, t_zt, hum_zt, u_zu, v_zu, rad_sw, rad_lw, slp
else:
return sst, t_zt, hum_zt, u_zu, v_zu, slp
data = create_data((3600, 2700, 2), chunks=None)
When I time this execution
tic = time.time()
out_data = noskin(*data)
toc = time.time() - tic
I get a runtime of ~100sec, so about 50s per timestep.
Am I expecting too much of the fortran code? Or is this slow for computing on ~1e7 data points.
I wonder if there is some obvious issue with our compiler flags?
I would make a plot of how long the execution time takes as a function of domain size. Just 4-5 points should be enough to see whether it is scaling linearly. If so, we can focus on optimizing with just a small piece of data.
Good idea! Ill do that in a bit.
For results see here: https://github.com/ocean-transport/scale-aware-air-sea/issues/28#issuecomment-1158136723
So from this I conclude that we can optimize on a small domain size e.g. 100x100.