[HELP] NaN in satelie images
melgor opened this issue · 4 comments
I'm encountering problems with NaN pixels for many locations in the US. I designed a pipeline, that downloads many products per single tile, and then I check if this is a valid image
-> if not download the next product.
However, it is a waste of resources on both my and API's side.
Is there any way to filter images by the number of NaN (like it can be done for cloud coverage)?
Also, could you explain why some images have NaN?
My location with such a problem:
%matplotlib inline
import os
import datetime as dt
import matplotlib.pyplot as plt
import numpy as np
import pyogrio
from sentinelhub import CRS, BBox, DataCollection, SHConfig, TileSplitter
from sentinelhub import SentinelHubCatalog
from sentinelhub.geometry import Geometry
from sentinelhub import MimeType, SentinelHubDownloadClient, SentinelHubRequest, bbox_to_dimensions, filter_times
config = SHConfig()
catalog = SentinelHubCatalog(config=config)
caspian_sea_bbox = BBox((-93.00, 30.64,-91.84, 31.63), crs=CRS.WGS84)
time_interval = "2020-01-01", "2021-01-01"
search_iterator = catalog.search(
DataCollection.SENTINEL2_L1C,
bbox=my_bbox_crswgs84,
time=time_interval,
filter="eo:cloud_cover < 1",
fields={"include": ["id", "properties.datetime", "properties.eo:cloud_cover"], "exclude": []},
)
results = list(search_iterator)
print("Total number of results:", len(results))
time_difference = dt.timedelta(hours=1)
all_timestamps = search_iterator.get_timestamps()
unique_acquisitions = filter_times(all_timestamps, time_difference)
evalscript_true_color = """
//VERSION=3
function setup() {
return {
input: [{
bands: ["B02", "B03", "B04"]
}],
output: {
bands: 3
}
};
}
function evaluatePixel(sample) {
return [sample.B04, sample.B03, sample.B02];
}
"""
process_requests = []
for timestamp in unique_acquisitions:
request = SentinelHubRequest(
evalscript=evalscript_true_color,
input_data=[
SentinelHubRequest.input_data(
data_collection=DataCollection.SENTINEL2_L1C.define_from("s2l1c", service_url=config.sh_base_url),
time_interval=(timestamp - time_difference, timestamp + time_difference)
)
],
responses=[SentinelHubRequest.output_response("default", MimeType.PNG)],
bbox=my_bbox_crswgs84,
size=bbox_to_dimensions(my_bbox_crswgs84, 100),
config=config,
)
process_requests.append(request)
client = SentinelHubDownloadClient(config=config)
download_requests = [request.download_list[0] for request in process_requests]
data = client.download(download_requests)
ncols, nrows = 2, 25
fig, axis = plt.subplots(
ncols=ncols, nrows=nrows, figsize=(15, 60), subplot_kw={"xticks": [], "yticks": [], "frame_on": False}
)
for idx, (image, timestamp) in enumerate(zip(data, unique_acquisitions)):
ax = axis[idx // ncols][idx % ncols]
ax.imshow(np.clip(image * 2.5 / 255, 0, 1))
ax.set_title(timestamp.date().isoformat(), fontsize=10)
plt.tight_layout()
The black stripes are normal: you are looking at the border of the orbits (see https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2/satellite-description/orbit). You can obtain dataMask
band (https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l1c/#available-bands-and-data) with your request and then filter them.
Thanks for your help. So, to download satellite images with low cloud-coverage and low nb of NaN, the best would be run a request for only dataMask
from different acquisition dates and analyze the data to select the best acquisition date for me. Am I right?
You also get the geometry of the tile in the catalog api response, and you could simply calculate area to see how much of your tile is actually covered with data on that day.
The following code snippet might help:
from sentinelhub import (BBox, CRS, SentinelHubCatalog, DataCollection)
import geopandas as gpd
import matplotlib.pyplot as plt
catalog = SentinelHubCatalog()
caspian_sea_bbox = BBox((49.9604, 44.7176, 51.0481, 45.2324), crs=CRS.WGS84)
time_interval = "2020-12-10", "2020-12-15"
search_iterator = catalog.search(
DataCollection.SENTINEL2_L2A,
bbox=caspian_sea_bbox,
time=time_interval,
filter="eo:cloud_cover < 5"
)
results = gpd.GeoDataFrame.from_features(list(search_iterator))
fig, ax = plt.subplots()
results.plot(ax=ax)
gpd.GeoDataFrame(geometry=[caspian_sea_bbox.geometry], crs=results.crs).plot(ax=ax, facecolor='r', alpha=0.5)
And then you can intersect your AOI with results,
results['aoi_intersection'] = results.geometry.intersection(caspian_sea_bbox.geometry).area / caspian_sea_bbox.geometry.area
At this point you can filter out all that have AOI smaller than some ratio, and download the rest.
Thank you very much!