Update data conversion functions to use `ee.data` methods.

Question

Update data conversion functions to use `ee.data` methods.

jdbcode opened this issue a year ago · 8 comments

Use ee.data. computeFeatures and computePixels in data conversion functions. By specifying a compatible fileFormat, these methods can return data in Python-native formats like structured NumPy arrays for rasters and Pandas DataFrames or GeoPandas GeoDataFrames for vectors. In the case of vectors, computeFeatures methods will make several network requests to fetch all the pages of the table before returning the Python object.

Here is a demo

ee_to_df could benefit - what others?

Answer 1 · 2023-11-06T21:34:43.000Z

These functions can benefit from the ee.data methods. I will work on this sometime next week.

Answer 2 · 2023-11-06T23:50:55.000Z

Thanks, Qiusheng. I'm happy to edit too, or look, answer questions, test, review when ready- let me know!

A bug was found with ee.data.computeFeatures - it currently won't return more than 1001 features. The PR should probably not be merged until the bug is fixed.

Answer 3 · 2023-11-07T02:53:33.000Z

Sure. I will wait for the fix.

Answer 4 · 2023-11-07T18:35:55.000Z

These functions can benefit from the ee.data methods. I will work on this sometime next week.

Oh my gosh! YES PLEASE!!

Answer 5 · 2023-12-17T22:40:05.000Z

@jdbcode It seems ee.data.computePixels ignores the bandIds parameter. It always output a a 2D array without bands.

https://developers.google.com/earth-engine/apidocs/ee-data-computepixels

# Region of interest.
coords = [
    -121.58626826832939,
    38.059141484827485,
]
region = ee.Geometry.Point(coords)

# Sentinel-2 median composite.
image = (ee.ImageCollection('COPERNICUS/S2')
              .filterBounds(region)
              .filterDate('2020-04-01', '2020-09-01')
              .median())

# Make a projection to discover the scale in degrees.
proj = ee.Projection('EPSG:4326').atScale(10).getInfo()

# Get scales out of the transform.
scale_x = proj['transform'][0]
scale_y = -proj['transform'][4]

# Make a request object.
request = {
    'expression': image,
    'fileFormat': 'NUMPY_NDARRAY',
    'bandIds': ['B4', 'B3', 'B2'],
    'grid': {
        'dimensions': {
            'width': 640,
            'height': 640
        },
        'affineTransform': {
            'scaleX': scale_x,
            'shearX': 0,
            'translateX': coords[0],
            'shearY': 0,
            'scaleY': scale_y,
            'translateY': coords[1]
        },
        'crsCode': proj['crs'],
    },
    'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]},
}

image_arr = ee.data.computePixels(request)
print(image_arr.shape)
# (640, 640) instead of (640, 640, 3)

Answer 6 · 2023-12-19T00:44:33.000Z

It returns a NumPy Structured array, which is a little different than the maybe-expected-3D array.

The structured array has the 3rd dimension included as tuples at the intersection of each row and column. They are accessible by name e.g. image_arr ['vis-red']. 'vis-red' is the band name because the request used the visualizationOptions parameters to scale to 8-bit RGB image.

in: image_arr

out: array([[(50, 69, 87), (50, 69, 87), (51, 68, 88), ..., (71, 78, 86),
        (65, 75, 86), (68, 81, 84)],
       [(50, 70, 87), (50, 69, 88), (49, 69, 89), ..., (73, 77, 88),
        (65, 76, 85), (64, 81, 83)],
       [(49, 69, 88), (50, 70, 88), (50, 70, 89), ..., (64, 75, 87),
        (63, 79, 84), (62, 80, 82)],
       ...,
       [(73, 80, 89), (66, 72, 84), (61, 69, 82), ..., (56, 67, 80),
        (55, 65, 79), (55, 65, 79)],
       [(76, 81, 90), (62, 72, 84), (57, 70, 81), ..., (54, 66, 80),
        (54, 66, 80), (54, 64, 80)],
       [(77, 82, 92), (62, 73, 83), (56, 70, 81), ..., (54, 66, 79),
        (54, 66, 79), (54, 64, 79)]],
      dtype=[('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])

in: image_arr.dtype

out: dtype([('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])

in: image_arr['vis-red']

out: array([[50, 50, 51, ..., 71, 65, 68],
       [50, 50, 49, ..., 73, 65, 64],
       [49, 50, 50, ..., 64, 63, 62],
       ...,
       [73, 66, 61, ..., 56, 55, 55],
       [76, 62, 57, ..., 54, 54, 54],
       [77, 62, 56, ..., 54, 54, 54]], dtype=uint8)

Answer 7 · 2023-12-19T00:48:20.000Z

Also, the example for ee.data.computePixels is a bit strange - they use the same params/args as the ee.data.getPixels example which is funky with how to specify the region to export. The advantage of ee.computePixels is that you can specify the area to download using an ee.Geomtry object with ee.Image.clipToBoundsAndScale:

knoxville = ee.Geometry.BBox(-84.07, 35.87, -83.79, 36.06) # some ROI
image1 = (ee.ImageCollection('COPERNICUS/S2')
              .filterBounds(knoxville)
              .filterDate('2020-07-01', '2020-09-01')
              .median()
              .setDefaultProjection('EPSG:4326', None, 20) # Or some other CRS/scale
              .clipToBoundsAndScale(geometry=knoxville, scale=20)) # Clip to ROI and scale

image_arr1 = ee.data.computePixels({
    'expression': image1,
    'fileFormat': 'NUMPY_NDARRAY',
    'bandIds': ['B4', 'B3', 'B2'],
    'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]} # Don't need to RGB
})

display(image_arr1)


import matplotlib.pyplot as plt
plt.imshow(np.dstack(([image_arr1[band] for band in image_arr1.dtype.names])))

Here is a notebook that describes the differences: https://developers.google.com/earth-engine/tutorials/community/data-converters

Answer 8 · 2023-12-19T02:13:18.000Z

Thank you for clarifying. It makes sense.