Update data conversion functions to use `ee.data` methods.
jdbcode opened this issue · 8 comments
Use ee.data.
computeFeatures
and computePixels
in data conversion functions. By specifying a compatible fileFormat
, these methods can return data in Python-native formats like structured NumPy arrays for rasters and Pandas DataFrames or GeoPandas GeoDataFrames for vectors. In the case of vectors, computeFeatures
methods will make several network requests to fetch all the pages of the table before returning the Python object.
ee_to_df
could benefit - what others?
Thanks, Qiusheng. I'm happy to edit too, or look, answer questions, test, review when ready- let me know!
A bug was found with ee.data.computeFeatures - it currently won't return more than 1001 features. The PR should probably not be merged until the bug is fixed.
Sure. I will wait for the fix.
@jdbcode It seems ee.data.computePixels
ignores the bandIds
parameter. It always output a a 2D array without bands.
https://developers.google.com/earth-engine/apidocs/ee-data-computepixels
# Region of interest.
coords = [
-121.58626826832939,
38.059141484827485,
]
region = ee.Geometry.Point(coords)
# Sentinel-2 median composite.
image = (ee.ImageCollection('COPERNICUS/S2')
.filterBounds(region)
.filterDate('2020-04-01', '2020-09-01')
.median())
# Make a projection to discover the scale in degrees.
proj = ee.Projection('EPSG:4326').atScale(10).getInfo()
# Get scales out of the transform.
scale_x = proj['transform'][0]
scale_y = -proj['transform'][4]
# Make a request object.
request = {
'expression': image,
'fileFormat': 'NUMPY_NDARRAY',
'bandIds': ['B4', 'B3', 'B2'],
'grid': {
'dimensions': {
'width': 640,
'height': 640
},
'affineTransform': {
'scaleX': scale_x,
'shearX': 0,
'translateX': coords[0],
'shearY': 0,
'scaleY': scale_y,
'translateY': coords[1]
},
'crsCode': proj['crs'],
},
'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]},
}
image_arr = ee.data.computePixels(request)
print(image_arr.shape)
# (640, 640) instead of (640, 640, 3)
It returns a NumPy Structured array, which is a little different than the maybe-expected-3D array.
The structured array has the 3rd dimension included as tuples at the intersection of each row and column. They are accessible by name e.g. image_arr ['vis-red']
. 'vis-red' is the band name because the request used the visualizationOptions
parameters to scale to 8-bit RGB image.
in: image_arr
out: array([[(50, 69, 87), (50, 69, 87), (51, 68, 88), ..., (71, 78, 86),
(65, 75, 86), (68, 81, 84)],
[(50, 70, 87), (50, 69, 88), (49, 69, 89), ..., (73, 77, 88),
(65, 76, 85), (64, 81, 83)],
[(49, 69, 88), (50, 70, 88), (50, 70, 89), ..., (64, 75, 87),
(63, 79, 84), (62, 80, 82)],
...,
[(73, 80, 89), (66, 72, 84), (61, 69, 82), ..., (56, 67, 80),
(55, 65, 79), (55, 65, 79)],
[(76, 81, 90), (62, 72, 84), (57, 70, 81), ..., (54, 66, 80),
(54, 66, 80), (54, 64, 80)],
[(77, 82, 92), (62, 73, 83), (56, 70, 81), ..., (54, 66, 79),
(54, 66, 79), (54, 64, 79)]],
dtype=[('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])
in: image_arr.dtype
out: dtype([('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])
in: image_arr['vis-red']
out: array([[50, 50, 51, ..., 71, 65, 68],
[50, 50, 49, ..., 73, 65, 64],
[49, 50, 50, ..., 64, 63, 62],
...,
[73, 66, 61, ..., 56, 55, 55],
[76, 62, 57, ..., 54, 54, 54],
[77, 62, 56, ..., 54, 54, 54]], dtype=uint8)
Also, the example for ee.data.computePixels
is a bit strange - they use the same params/args as the ee.data.getPixels
example which is funky with how to specify the region to export. The advantage of ee.computePixels
is that you can specify the area to download using an ee.Geomtry
object with ee.Image.clipToBoundsAndScale
:
knoxville = ee.Geometry.BBox(-84.07, 35.87, -83.79, 36.06) # some ROI
image1 = (ee.ImageCollection('COPERNICUS/S2')
.filterBounds(knoxville)
.filterDate('2020-07-01', '2020-09-01')
.median()
.setDefaultProjection('EPSG:4326', None, 20) # Or some other CRS/scale
.clipToBoundsAndScale(geometry=knoxville, scale=20)) # Clip to ROI and scale
image_arr1 = ee.data.computePixels({
'expression': image1,
'fileFormat': 'NUMPY_NDARRAY',
'bandIds': ['B4', 'B3', 'B2'],
'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]} # Don't need to RGB
})
display(image_arr1)
import matplotlib.pyplot as plt
plt.imshow(np.dstack(([image_arr1[band] for band in image_arr1.dtype.names])))
Here is a notebook that describes the differences: https://developers.google.com/earth-engine/tutorials/community/data-converters
Thank you for clarifying. It makes sense.