Looping function help
CSteele97 opened this issue · 2 comments
I submitted an issue a few months ago querying how to run the analysis on multiple images at a time, I think I have managed to write a function to do this however when saving the cluster rasters the code overwrites the previous image. Do you know how this could be resolved? I have attached the code for the function below.
Many thanks,
Chloe
def k_means_segmentation(folder):
images = []
for filename in os.listdir(folder):
img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
ds = gdal.Open(os.path.join(folder, filename))
band = ds.GetRasterBand(2)
arr = band.ReadAsArray()
[cols, rows] = arr.shape
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
images.append(img)
for i in images:
new_shape = (i.shape[0] * i.shape[1], i.shape[2])
X = i[:, :, :8].reshape(new_shape)
k_means = cluster.KMeans(n_clusters=5)
k_means.fit(X)
X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(i[:, :, 0].shape)
format = "GTiff"
driver = gdal.GetDriverByName(format)
filename = "/testing_results/k_means.tif"
outDataRaster = driver.Create(filename, rows, cols, 1, gdal.GDT_Byte)
outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input
outDataRaster.GetRasterBand(1).WriteArray(X_cluster)
outDataRaster.FlushCache() ## remove from memory
del outDataRaster
Since my last post I believe I have sorted the naming issue, with each cluster map being created as a separate file. However, I have since come across an issue whereby the files all occupy the same space i.e. when opened in QGIS, the tiles are in one location. I believe this is due to the following lines of code:
ds = gdal.Open(os.path.join(folder, filename))
band = ds.GetRasterBand(2)
arr = band.ReadAsArray()
[cols, rows] = arr.shape
outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input
I am not sure how to obtain the geotransform and projection information from each raster, as it appears the information is taken from the last file in the loop and ds is overwritten each time.
I will attach my full code below.
Any advice or help in how to solve this would be greatly appreciated.
Many thanks,
Chloe
def k_means_segmentation(folder):
images = []
for filename in os.listdir(folder):
img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
ds = gdal.Open(os.path.join(folder, filename))
band = ds.GetRasterBand(2)
arr = band.ReadAsArray()
[cols, rows] = arr.shape
img = np.zeros((img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType))
for b in range(img.shape[2]):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()
images.append(img)
d = 0
for i in images:
new_shape = (i.shape[0] * i.shape[1], i.shape[2])
X = i[:, :, :8].reshape(new_shape)
k_means = cluster.KMeans(n_clusters=5)
k_means.fit(X)
X_cluster = k_means.labels_
X_cluster = X_cluster.reshape(i[:, :, 0].shape)
format = "GTiff"
driver = gdal.GetDriverByName(format)
file = "/testing_results/k_means_%d.tif"%d
outDataRaster = driver.Create(file, rows, cols, 1, gdal.GDT_Byte)
outDataRaster.SetGeoTransform(ds.GetGeoTransform())##sets same geotransform as input
outDataRaster.SetProjection(ds.GetProjection())##sets same projection as input
outDataRaster.GetRasterBand(1).WriteArray(X_cluster)
outDataRaster.FlushCache() ## remove from memory
del outDataRaster
d+= 1
I haven't run this code but...
img_ds = gdal.Open(os.path.join(folder,filename), gdal.GA_ReadOnly)
ds = gdal.Open(os.path.join(folder, filename))
Are essentially opening the same file twice.
If it was me then I would built a write raster function and call it after each time you run k-means. parse in the image and the output,