iceutils
is a collection of Python classes and functions for interfacing with geospatial rasters. Support includes common operations on rasters, such as warping, cropping, masking, gradient computaton, basic arithmetic, etc. The motivation for iceutils
is the simplification of I/O of rasters and associated metadata while streamlining the interface between raster data and various NumPy, SciPy, and Matplotlib functions. In short, by treating rasters as metadata-decorated NumPy arrays, we can easily assimilate rasters of different projections into our analysis and visualization code using familiar interfaces.
Time-dependent rasters can be stored in a stack object in order to perform time series analysis and visualization of raster changes. Time series methods are based on the regularized least squares approaches outlined in "Riel, B., Minchew, B., & Joughin, I. (2021). Observing traveling waves in glaciers with remote sensing: new flexible time series methods and application to Sermeq Kujalleq (Jakobshavn Isbræ), Greenland. The Cryosphere."
Prior to installing iceutils
, we'll need to install a number of Python dependencies.
numpy
scipy
matplotlib
gdal
h5py
pyproj
scikit-image
tqdm
opencv (optional)
scikit-learn (optional)
pint (optional)
cvxopt (optional)
All of the packages can be installed via Anaconda using the conda-forge
channel. Also note that optional packages scikit-learn
, pint
, and cvxopt
are only necessary for using the iceutils.tseries
module. Additionally, opencv
is only needed for the iceutils.correlate
module and some isolated inpainting routines. The installation process can be streamlined using the files requirements_base.txt
and requirements_extra.txt
:
# Install only base requirements
conda install -c conda-forge --file=requirements_base.txt
# Or install all requirements
conda install -c conda-forge --file=requirements_base.txt --file=requirements_extra.txt
If you are using the main Anaconda channel, you'll likely have to install pint
with pip:
pip install pint
To install iceutils
, you may clone a read-only version of the repository:
git clone https://github.com/bryanvriel/iceutils.git
Or, if you are developer, you may clone with SSH:
git clone git@github.com:bryanvriel/iceutils.git
Then, simply run pip install .
in the main repository directory to install.
In the cloned directory, you'll find several Python source files, each containing various functions and classes. While the naming of the source files gives a hint about what they contain, all functions are classes are imported into a common namespace. For example, the file timeutils.py
contains a function generateRegularTimeArray()
. This function would be called as follows:
import iceutils as ice
t = ice.generateRegularTimeArray(tmin, tmax)
In the file raster.py
are two classes, Raster
and RasterInfo
. The former encapsulates basic raster-type data (i.e., 2D imagery) and provides some convenience functions to interface with the GDAL Python API. Therefore, any raster format that is compatible with GDAL can be read in with the Raster
class, e.g.:
raster = ice.Raster(rasterfile='velocity.tif')
Any instance of the Raster
class will have as an attribute an instance of the RasterInfo
class. This class is a separate class that encapsulates all relevant metadata associated with the raster: 1) upper left pixel coordinates; 2) pixel spacing; and 3) coordinate projection system. The RasterInfo
instance and its data can be accessed via the hdr
class variable, e.g.:
raster = ice.Raster(rasterfile='velocity.tif')
print(raster.hdr.xstart) # Upper left X-coordinate
print(raster.hdr.ystart) # Upper left Y-coordinate
print(raster.hdr.dx) # X-spacing
print(raster.hdr.dy) # Y-spacing
As shown before, one can load raster files by providing the path to the GDAL-compatible file via the keyword argument rasterfile=
(later, we'll see how to interface with certain HDF5 datasets using a different keyword argument):
raster = ice.Raster(rasterfile='velocity.tif')
If you would like to load only a subset of the raster, you may provide a GDAL-compatible projection window (e.g., projWin
) defined by the standard [ulx, uly, lrx, lry]
:
# Projection window (same coordinate system/SRS as raster)
projWin = [-120, 30, -118, 28]
# Load subset raster
raster = ice.Raster('large_mosaic_raster.vrt', projWin=projWin)
Alternatively, you may provide Python slice
objects to the Raster
constructor. These slices correspond to min-max indices of the image in row and column coordinates:
# Row bounds
islice = slice(100, 300)
# Column bounds
jslice = slice(400, 800)
# Read a raster subset
raster = ice.Raster(rasterfile='velocity.tif',
islice=islice,
jslice=jslice)
Of course, for both the projWin
and islice/jslice
interfaces, the RasterInfo
instance contained in the raster will have its data automatically adjusted for the subset bounds.
To dump raster data to a file, we use the Raster.write_gdal
function:
raster.write_gdal('output.dat', driver='ENVI', epsg=3413)
Any GDAL-compatible driver can be passed to the driver
keyword argument (e.g., 'GTiff'
, 'ISCE'
, 'GMT'
, etc.). Additionally, one can pass in an EPSG code to specify the coordinate system of the data in order for GDAL to write relevant projection data (as shown in the example above; if epsg=None
, the EPSG code is retrieved from the hdr
attribute of the raster).
Alternatively, we can write a NumPy array directly to a raster if we have an associated RasterInfo
object.
ice.write_array_as_raster(array, hdr, 'output.dat', driver='ENVI')
Let's say we have a raster covering a geographic area, and we wish to resample the data to the geographic area of another raster with the same projection. This can be done using the Raster.resample
function:
# First raster
raster = ice.Raster(rasterfile='velocity.tif')
# Another raster with a different geographic area
ref_raster = ice.Raster(rasterfile='velocity2.tif')
# Resample first raster to area of second
raster.resample(ref_raster.hdr)
In some cases, we would like to resample a raster to a geographic area with a different projection (warping). In this case, the target geographic region can be specified by an EPSG code or by a separate RasterInfo
object, either from an existing raster or created on-the-fly. For example, to warp a latitude-longitude raster to Polar Stereographic defined by a separate raster:
# Source raster in latitude-longitude (EPSG: 4326)
src_raster = ice.Raster(rasterfile='velocity_latlon.tif')
# Another raster in polar stereographic north (EPSG: 3413)
trg_raster = ice.Raster(rasterfile='velocity_polar.tif')
# Warp source raster
ice.warp(src_raster, target_hdr=trg_raster.hdr)
We can also pre-construct the target coordinates in memory and create a RasterInfo
object on the fly:
# Meshgrid of target coordinates in polar stereographic
x = np.linspace(1000, 3000, 100)
y = np.linspace(-2200, -2400, 100)
X, Y = np.meshgrid(x, y)
# Create RasterInfo
trg_hdr = ice.RasterInfo(X=X, Y=Y, epsg=3413)
# Warp source raster
ice.warp(src_raster, target_hdr=trg_hdr)
To convert between physical XY-coordinates of the raster (in its projection system) to image coordinates:
# XY -> image
row, col = raster.hdr.xy_to_imagecoord(x, y)
# image -> XY
x, y = raster.hdr.imagecoord_to_xy(row, col)
Note that row and column indices for a given XY-coordinate are rounded to the nearest integer.
We often extract 1D transects from rasters in our analysis. At the moment, this functionality exists by providing the XY-coordinates of the endpoints of the transect of interest:
# The starting point coordinate (X, Y) tuple
start = (1000.0, 1000.0)
# The ending point coordinate
end = (1200.0, 800.0)
# Extract transect
transect = raster.transect(start, end, n=200)
Note that the keyword argument n
specifies the number of equally-spaced points you would like the transect to have.
iceutils
also provides an implementation for raster time series, which we call a Stack
. The key attribute of a stack is that all rasters will have the same projection system, geographic area, and image size, which will allow us to store stacks as 3D numpy arrays in memory. To store stacks on disk, the current implementation uses HDF5. The dataset layout of the HDF5 file is:
x Dataset {N}
y Dataset {M}
tdec Dataset {K}
igram Dataset {K, M, N}
weights Dataset {K, M, N}
The first two datasets are 1D datasets corresponding to the coordinates of the stack. The third dataset, tdec
is a 1D dataset corresponding to the decimal years for each raster in the stack. The 3D dataset igram
contains the actual stack (note: the name igram
is InSAR based, so I'll be changing this to something else very soon). Finally, the 3D dataset weights
correspond to the weights associated with each raster. Generally, I set these to be the inverse of error/uncertainty maps associated with my data (e.g., *ex.tif
and *ey.tif
files for the Measures GIMP data), but ostensibly these can be set to anything sensible.
The Stack
class implemented in iceutils
is for the most part a convenience interface to the underlying HDF5 file via the h5py
Python package. For example, let's read in a stack and get a numpy array for the nearest raster image for a certain time:
import numpy as np
import iceutils as ice
# Load the stack
stack = ice.Stack('velocity_stack.h5')
# Find the nearest time index for a decimal year of interest
t = 2014.43
index = np.argmin(np.abs(stack['tdec'] - t))
# Get numpy array for that index
raster_array = stack['igram'][index, :, :]
In the above example, we access the HDF5 datasets in the Stack
via a key (similar to a Python dictionary). Also, when we "load" a stack, we don't actually load it into memory. Similarly, accessing an HDF5 dataset does not automatically load that dataset into memory (h5py
behavior). However, indexing an HDF5 dataset will load the data into memory (e.g., raster_array
is in memory).
A common action performed on stacks is extracting the 1D time series at a given geographic coordinate:
# The XY-coordinate of interest
xy = (1000.0, 1000.0)
# Extract time series
data = stack.timeseries(xy=xy)
# Alternatively, use row/column coordinates
coord = (300, 400)
data = stack.timeseries(coord=coord)
# Extract a 3x3 window centered around coordinate
# of interest, and get mean time series
data = stack.timeseries(xy=xy, win_size=3)
More often than not, we would like to compute quantities of multiple stacks, e.g. velocity magnitude given stacks of x- and y- velocity components. Instead of creating new stack objects that compute those quantities and then write the data to disk, we can use a MultiStack
class that acts as a "virtual" Stack object with a limited set of arithmetic operations evaluated on multiple Stacks when queried. Different child classes inherit from MultiStack
to implement the different arithmetic operations. Currently, we only have MagStack
for vector magnitudes and SumStack
for summing multiple stacks. As an example, let's create a velocity magnitude Stack from two independent Stacks vx.h5
and vy.h5
which represent velocities in the X- and Y-directions, respectively:
stack = ice.MagStack(files=['vx.h5', 'vy.h5'])
This object can then be queried in the same way a Stack
can be queried:
# Get time series
d = stack.timeseries(xy=(1000.0, 1000.0))
# Get velocity slice at a given time index
index = 100
v = stack.slice(index)
See the IPython notebook doc/time_series_inversion.ipynb
for a full example.
iceutils is MIT licensed, as found in the LICENSE file