[HELP] NaN in satelie images

Question

[HELP] NaN in satelie images

melgor opened this issue a year ago · 4 comments

I'm encountering problems with NaN pixels for many locations in the US. I designed a pipeline, that downloads many products per single tile, and then I check if this is a valid image -> if not download the next product.
However, it is a waste of resources on both my and API's side.
Is there any way to filter images by the number of NaN (like it can be done for cloud coverage)?
Also, could you explain why some images have NaN?

My location with such a problem:

%matplotlib inline

import os
import datetime as dt

import matplotlib.pyplot as plt
import numpy as np
import pyogrio

from sentinelhub import CRS, BBox, DataCollection, SHConfig, TileSplitter
from sentinelhub import SentinelHubCatalog
from sentinelhub.geometry import Geometry

from sentinelhub import MimeType, SentinelHubDownloadClient, SentinelHubRequest, bbox_to_dimensions, filter_times

config = SHConfig()
catalog = SentinelHubCatalog(config=config)

caspian_sea_bbox = BBox((-93.00, 30.64,-91.84, 31.63), crs=CRS.WGS84)
time_interval = "2020-01-01", "2021-01-01"
search_iterator = catalog.search(
    DataCollection.SENTINEL2_L1C,
    bbox=my_bbox_crswgs84,
    time=time_interval,
    filter="eo:cloud_cover < 1",
    fields={"include": ["id", "properties.datetime", "properties.eo:cloud_cover"], "exclude": []},
)

results = list(search_iterator)
print("Total number of results:", len(results))


time_difference = dt.timedelta(hours=1)

all_timestamps = search_iterator.get_timestamps()
unique_acquisitions = filter_times(all_timestamps, time_difference)

evalscript_true_color = """
    //VERSION=3

    function setup() {
        return {
            input: [{
                bands: ["B02", "B03", "B04"]
            }],
            output: {
                bands: 3
            }
        };
    }

    function evaluatePixel(sample) {
        return [sample.B04, sample.B03, sample.B02];
    }
"""


process_requests = []

for timestamp in unique_acquisitions:
    request = SentinelHubRequest(
        evalscript=evalscript_true_color,
        input_data=[
            SentinelHubRequest.input_data(
                data_collection=DataCollection.SENTINEL2_L1C.define_from("s2l1c", service_url=config.sh_base_url),
                time_interval=(timestamp - time_difference, timestamp + time_difference)
            )
        ],
        responses=[SentinelHubRequest.output_response("default", MimeType.PNG)],
        bbox=my_bbox_crswgs84,
        size=bbox_to_dimensions(my_bbox_crswgs84, 100),
        config=config,
    )
    process_requests.append(request)

client = SentinelHubDownloadClient(config=config)

download_requests = [request.download_list[0] for request in process_requests]

data = client.download(download_requests)

ncols, nrows = 2, 25

fig, axis = plt.subplots(
    ncols=ncols, nrows=nrows, figsize=(15, 60), subplot_kw={"xticks": [], "yticks": [], "frame_on": False}
)

for idx, (image, timestamp) in enumerate(zip(data, unique_acquisitions)):
    ax = axis[idx // ncols][idx % ncols]
    ax.imshow(np.clip(image * 2.5 / 255, 0, 1))
    ax.set_title(timestamp.date().isoformat(), fontsize=10)

plt.tight_layout()

Answer 1 · 2023-12-01T09:47:15.000Z

The black stripes are normal: you are looking at the border of the orbits (see https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2/satellite-description/orbit). You can obtain dataMask band (https://docs.sentinel-hub.com/api/latest/data/sentinel-2-l1c/#available-bands-and-data) with your request and then filter them.

Answer 2 · 2023-12-01T10:29:19.000Z

Thanks for your help. So, to download satellite images with low cloud-coverage and low nb of NaN, the best would be run a request for only dataMask from different acquisition dates and analyze the data to select the best acquisition date for me. Am I right?

Answer 3 · 2023-12-01T11:31:26.000Z

You also get the geometry of the tile in the catalog api response, and you could simply calculate area to see how much of your tile is actually covered with data on that day.

The following code snippet might help:

from sentinelhub import (BBox, CRS, SentinelHubCatalog, DataCollection)
import geopandas as gpd
import matplotlib.pyplot as plt

catalog = SentinelHubCatalog()
caspian_sea_bbox = BBox((49.9604, 44.7176, 51.0481, 45.2324), crs=CRS.WGS84)
time_interval = "2020-12-10", "2020-12-15"

search_iterator = catalog.search(
    DataCollection.SENTINEL2_L2A,
    bbox=caspian_sea_bbox,
    time=time_interval,
    filter="eo:cloud_cover < 5"
)
results = gpd.GeoDataFrame.from_features(list(search_iterator))

fig, ax = plt.subplots()
results.plot(ax=ax)
gpd.GeoDataFrame(geometry=[caspian_sea_bbox.geometry], crs=results.crs).plot(ax=ax, facecolor='r', alpha=0.5)

And then you can intersect your AOI with results,

results['aoi_intersection'] = results.geometry.intersection(caspian_sea_bbox.geometry).area / caspian_sea_bbox.geometry.area

to get the following:

At this point you can filter out all that have AOI smaller than some ratio, and download the rest.

Answer 4 · 2023-12-01T12:08:45.000Z

Thank you very much!