acgeospatial/Satellite_Imagery_Python

Looping function help

CSteele97 opened this issue · 2 comments

Hi @acgeospatial

I submitted an issue a few months ago querying how to run the analysis on multiple images at a time, I think I have managed to write a function to do this however when saving the cluster rasters the code overwrites the previous image. Do you know how this could be resolved? I have attached the code for the function below.

Many thanks,
Chloe

def k_means_segmentation(folder):
    images = []
    for filename in os.listdir(folder):
        img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
        ds = gdal.Open(os.path.join(folder, filename))
        band = ds.GetRasterBand(2)
        arr = band.ReadAsArray()
        [cols, rows] = arr.shape

        img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
        for b in range(img.shape[2]):
            img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
        images.append(img)
        
    for i in images:
        new_shape = (i.shape[0] * i.shape[1], i.shape[2])
        X = i[:, :, :8].reshape(new_shape)
        k_means = cluster.KMeans(n_clusters=5)
        k_means.fit(X)

        X_cluster = k_means.labels_
        X_cluster = X_cluster.reshape(i[:, :, 0].shape)
        
        format = "GTiff"
        driver = gdal.GetDriverByName(format)
        
        filename = "/testing_results/k_means.tif"
        outDataRaster = driver.Create(filename, rows, cols, 1, gdal.GDT_Byte)
        outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
        outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input
        
        outDataRaster.GetRasterBand(1).WriteArray(X_cluster)
        
        outDataRaster.FlushCache() ## remove from memory
        del outDataRaster

Hi @acgeospatial

Since my last post I believe I have sorted the naming issue, with each cluster map being created as a separate file. However, I have since come across an issue whereby the files all occupy the same space i.e. when opened in QGIS, the tiles are in one location. I believe this is due to the following lines of code:

        ds = gdal.Open(os.path.join(folder, filename))
        band = ds.GetRasterBand(2)
        arr = band.ReadAsArray()
        [cols, rows] = arr.shape

        outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
        outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input

I am not sure how to obtain the geotransform and projection information from each raster, as it appears the information is taken from the last file in the loop and ds is overwritten each time.

I will attach my full code below.

Any advice or help in how to solve this would be greatly appreciated.

Many thanks,
Chloe

def k_means_segmentation(folder):
    images = []
    for filename in os.listdir(folder):
        img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
        
        ds = gdal.Open(os.path.join(folder, filename))
        band = ds.GetRasterBand(2)
        arr = band.ReadAsArray()
        [cols, rows] = arr.shape

        img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
               gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
        for b in range(img.shape[2]):
            img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
        images.append(img)
    
        d = 0
        
        for i in images:
            new_shape = (i.shape[0] * i.shape[1], i.shape[2])
            X = i[:, :, :8].reshape(new_shape)
            k_means = cluster.KMeans(n_clusters=5)
            k_means.fit(X)
    
            X_cluster = k_means.labels_
            X_cluster = X_cluster.reshape(i[:, :, 0].shape)
        
            format = "GTiff"
            driver = gdal.GetDriverByName(format)
        
            file = "/testing_results/k_means_%d.tif"%d
            outDataRaster = driver.Create(file, rows, cols, 1, gdal.GDT_Byte)
            outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
            outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input
        
            outDataRaster.GetRasterBand(1).WriteArray(X_cluster)
            
            outDataRaster.FlushCache() ## remove from memory
            del outDataRaster
        
            d+= 1

I haven't run this code but...

    img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
    
    ds = gdal.Open(os.path.join(folder, filename))

Are essentially opening the same file twice.

If it was me then I would built a write raster function and call it after each time you run k-means. parse in the image and the output,