imod.prepare - Prepare model input¶
Prepare model input.
A various selection of functions to mangle your data from one form into another,
such that it will fit into your model. This includes
imod.prepare.reproject()
for reprojecting grids, and
imod.prepare.rasterize()
to create grids from vector files.
Naturally you are free to write your own functions or one of the many existing
ones from other packages. As long as you keep your data in the expected
xarray.DataArray
and pandas.DataFrame
formats, this will work. In some
cases, such as imod.prepare.Regridder
, these methods are optimized for
speed by making use of the Numba compiler, to be able to regrid large datasets.
-
class
imod.prepare.
LayerRegridder
(method)[source]¶ Bases:
object
Object to repeatedly regrid layers of similar objects. Compiles once on first call, can then be repeatedly called without JIT compilation overhead.
-
method
¶ The method to use for regridding. Default available methods are:
{"mean", "harmonic_mean", "geometric_mean", "sum", "minimum", "maximum", "mode", "median", "max_overlap"}
- Type
str, function
-
regrid
(source, source_top, source_bottom, destination_top, destination_bottom)[source]¶ - Parameters
source (xr.DataArray) – The values of the layered model.
source_top (xr.DataArray) – The vertical location of the layer tops.
destination_top (xr.DataArray) – The vertical location of the layer tops.
source_bottom (xr.DataArray) – The vertical location of the layer bottoms.
destination_bottom (xr.DataArray) – The vertical location of the layer bottoms.
- Returns
regridded
- Return type
xr.DataArray
-
-
class
imod.prepare.
Regridder
(method, ndim_regrid=None, use_relative_weights=False, extra_overlap=0)[source]¶ Bases:
object
Object to repeatedly regrid similar objects. Compiles once on first call, can then be repeatedly called without JIT compilation overhead.
-
method
¶ The method to use for regridding. Default available methods are:
{"nearest", "multilinear", mean", "harmonic_mean", "geometric_mean", "sum", "minimum", "maximum", "mode", "median", "conductance"}
- Type
str, function
-
ndim_regrid
¶ The number of dimensions over which to regrid. If not provided,
ndim_regrid
will be inferred. It serves to prevent regridding over an unexpected number of dimensions; say you want to regrid over only two dimensions. Due to an input error in the coordinates oflike
, three dimensions may be inferred in the first.regrid
call. An error will be raised if ndim_regrid not match the number of inferred dimensions. Default value is None.- Type
int, optional
-
use_relative_weights
¶ Whether to use relative weights in the regridding method or not. Relative weights are defined as: cell_overlap / source_cellsize, for every axis.
This argument should only be used if you are providing your own
method
as a function, where the function requires relative, rather than absolute weights (the providedconductance
method requires relative weights, for example). Default value is False.- Type
bool, optional
-
extra_overlap
¶ In case of chunked regridding, how many cells of additional overlap is necessary. Linear interpolation requires this for example, as it reaches beyond cell boundaries to compute values. Default value is 0.
- Type
integer, optional
Examples
Initialize the Regridder object:
>>> mean_regridder = imod.prepare.Regridder(method="mean")
Then call the
regrid
method to regrid.>>> result = mean_regridder.regrid(source, like)
The regridder can be re-used if the number of regridding dimensions match, saving some time by not (re)compiling the regridding method.
>>> second_result = mean_regridder.regrid(second_source, like)
A one-liner is possible for single use:
>>> result = imod.prepare.Regridder(method="mean").regrid(source, like)
It’s possible to provide your own methods to the
Regridder
, provided that numba can compile them. They need to take the argumentsvalues
andweights
. Make sure they deal withnan
values gracefully!>>> def p30(values, weights): >>> return np.nanpercentile(values, 30)
>>> p30_regridder = imod.prepare.Regridder(method=p30) >>> p30_result = p30_regridder.regrid(source, like)
The Numba developers maintain a list of support Numpy features here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
In general, however, the provided methods should be adequate for your regridding needs.
-
regrid
(source, like, fill_value=nan)[source]¶ Regrid
source
along dimensions thatsource
andlike
share. These dimensions will be inferred the first time.regrid
is called for the Regridder object.Following xarray conventions, nodata is assumed to
np.nan
.- Parameters
source (xr.DataArray of floats) –
like (xr.DataArray of floats) – The like array present what the coordinates should look like.
fill_value (float) – The fill_value. Defaults to np.nan
- Returns
result – Regridded result.
- Return type
xr.DataArray
-
-
class
imod.prepare.
Voxelizer
(method, use_relative_weights=False)[source]¶ Bases:
object
Object to repeatedly voxelize similar objects. Compiles once on first call, can then be repeatedly called without JIT compilation overhead.
-
method
¶ The method to use for regridding. Default available methods are:
{"mean", "harmonic_mean", "geometric_mean", "sum", "minimum", "maximum", "mode", "median", "max_overlap"}
- Type
str, function
Examples
Usage is similar to the regridding. Initialize the Voxelizer object:
>>> mean_voxelizer = imod.prepare.Voxelizer(method="mean")
Then call the
voxelize
method to transform a layered dataset into a voxel based one. The vertical coordinates of the layers must be provided bytop
andbottom
.>>> mean_voxelizer.voxelize(source, top, bottom, like)
If your data is already voxel based, i.e. the layers have tops and bottoms that do not differ with x or y, you should use a
Regridder
instead.It’s possible to provide your own methods to the
Regridder
, provided that numba can compile them. They need to take the argumentsvalues
andweights
. Make sure they deal withnan
values gracefully!>>> def p30(values, weights): >>> return np.nanpercentile(values, 30)
>>> p30_voxelizer = imod.prepare.Voxelizer(method=p30) >>> p30_result = p30_voxelizer.regrid(source, top, bottom, like)
The Numba developers maintain a list of support Numpy features here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
In general, however, the provided methods should be adequate for your voxelizing needs.
-
voxelize
(source, top, bottom, like)[source]¶ - Parameters
source (xr.DataArray) – The values of the layered model.
top (xr.DataArray) – The vertical location of the layer tops.
bottom (xr.DataArray) – The vertical location of the layer bottoms.
like (xr.DataArray) – An example DataArray providing the coordinates of the voxelized results; what it should look like in terms of dimensions, data type, and coordinates.
- Returns
voxelized
- Return type
xr.DataArray
-
-
imod.prepare.
celltable
(path, column, resolution, like, chunksize=10000.0)[source]¶ Process area of features by rasterizing in a chunkwise manner to limit memory usage.
Returns a table of cell indices (row, column) with for example feature ID, and feature area within cell. Essentially returns a COO sparse matrix, but with duplicate values per cell, since more than one geometry may be present.
The feature area within the cell is approximated by first rasterizing the feature, and then counting the number of occuring cells. This means the accuracy of the area depends on the cellsize of the rasterization step.
A celltable is returned, as a
pandas.DataFrame
. It has the following columns:"row_index"
"col_index"
the value of the
column
argument"area"
"row_index"
and"col_index"
are the indices of the like array in which the polygon is located. Thecolumn
value holds the rasterized value of the specified column."area"
contains the area of the polygon within the cell.The most convenient way of using this celltable is by specifying a feature ID as
column
. After creating a celltable,pandas.DataFrame.merge()
can be used to join additional data on this ID. Refer to the examples.- Parameters
path (str or pathlib.Path) – path to OGR supported vector file (e.g. a shapefile)
column (str) – column name of column to burn into raster
resolution (float) – cellsize at which the rasterization, and determination of area within cellsize occurs. Very small values are recommended (e.g. <= 0.5 m).
like (xarray.DataArray) – Example DataArray of where the cells will be located. Used only for the coordinates.
chunksize (int, optional) – The size of the chunksize. Used for both x and y dimension.
- Returns
celltable
- Return type
pandas.DataFrame
Examples
Assume we have a shapefile called
waterways.shp
and information on the model discretization is described by alike
DataArray. The feature ID is provided by a column in the shapefile called “ID-code”. Additionally, this shapefile also specifies bed hydraulic resistance (c0). For this specific discretization, we wish to calculate a conductance (area divided by hydraulic resistance). To do so, we:create a
celltable
join the additional attributes (such as c0)
compute the conductance per feature
sum conductances per cell
Import the required packages.
>>> import imod >>> import geopandas as gpd
Generate the celltable.
>>> celltable = imod.prepare.celltable( path="waterways.shp", column="ID-code", resolution=0.5, like=like, )
Load the shapefile with geopandas into a
GeoDataFrame
.>>> gdf = gpd.read_file("waterways.shp)
Select the relevant columns into a
pandas.DataFrame
and merge with the celltable.>>> df = gdf[["ID-code", "c0"]] >>> joined = celltable.merge(gdf, on="ID-code")
We compute the conductance, and sum it per cell using
pandas
methods:>>> joined["conductance"] = joined["area"] / joined["c0"] >>> summed_conductance = joined.groupby(["row_index", "col_index"], as_index=False)[ "conductance" ].sum()
Finally, turn the result into a DataArray so it can be used as model input:
>>> conductance = imod.prepare.rasterize_celltable( table=summed_conductance, column="conductance", like=like, )
-
imod.prepare.
fill
(da, invalid=None, by=None)[source]¶ Replace the value of invalid
da
cells (indicated byinvalid
) using basic nearest neighbour interpolation.- Parameters
da (xr.DataArray with gaps) – array containing missing value
by (str, optional) – dimension over which the array will be filled, one by one. See the examples.
invalid (xr.DataArray) – a binary array of same shape as
da
. data value are replaced where invalid is True If None (default), uses: invalid = np.isnan(data)
- Returns
with the same coordinates as the input.
- Return type
xarray.DataArray
Examples
A common use case is filling holes in a DataArray, filling it with the value of its nearest (valid) neighbor:
>>> filled = imod.prepare.fill(da)
In case of a tie (e.g. neighbors in x and y are both one cell removed), the neighbor in the last dimension is chosen (for rasters, that’s generally x).
A typical use case is filling a 3D array (layer, y, x), but only in the horizontal dimensions. The
by
keyword can be used to do this:>>> filled = imod.prepare.fill(da, by="layer")
In this case, the array is filled by one layer at a time.
-
imod.prepare.
gdal_rasterize
(path, column, like=None, nodata=None, dtype=None, spatial_reference=None)[source]¶ Use GDAL to rasterize a vector file into an xarray.DataArray.
Can be significantly more efficient than rasterize. This doesn’t load the vector data into a GeoDataFrame and loops over the individual shapely geometries like rasterio.rasterize does, but loops over the features within GDAL instead.
- Parameters
path (str or pathlib.Path) – path to OGR supported vector file (e.g. a shapefile)
column (str) – column name of column to burn into raster
like (xr.DataArray, optional) – example of raster
nodata (int, float; optional) –
dtype (numpy.dtype, optional) –
spatial_reference (dict, optional) –
Optional dict to avoid allocating the like DataArray. Used if template is None. Dict has keys “bounds” and “cellsizes”, with:
bounds = (xmin, xmax, ymin, ymax)
cellsizes = (dx, dy)
- Returns
rasterized
- Return type
np.array
-
imod.prepare.
laplace_interpolate
(source, ibound=None, close=0.01, mxiter=5, iter1=50, relax=0.98)[source]¶ Fills gaps in source by interpolating from existing values using Laplace interpolation.
- Parameters
source (xr.DataArray with dims (y, x)) – Data values to interpolate.
ibound (xr.DataArray with dims (y, x)) – Precomputed array which marks where to interpolate
close (float) – Closure criteration of iterative solver. Should be one to two orders of magnitude smaller than desired accuracy.
mxiter (int) – Outer iterations of iterative solver.
iter1 (int) – Inner iterations of iterative solver. Should not exceed 50.
relax (float) – Iterative solver relaxation parameter. Should be between 0 and 1.
- Returns
interpolated – source, with interpolated values where ibound equals 1
- Return type
xr.DataArray with dims (y, x)
-
imod.prepare.
polygonize
(da)[source]¶ Polygonize a 2D-DataArray into a GeoDataFrame of polygons.
- Parameters
da (xr.DataArray) –
- Returns
polygonized
- Return type
geopandas.GeoDataFrame
-
imod.prepare.
rasterize
(geodataframe, like, column=None, fill=nan, **kwargs)[source]¶ Rasterize a geopandas GeoDataFrame onto the given xarray coordinates.
- Parameters
geodataframe (geopandas.GeoDataFrame) –
column (str, int, float) – column name of geodataframe to burn into raster
like (xarray.DataArray) – Example DataArray. The rasterized result will match the shape and coordinates of this DataArray.
fill (float, int) – Fill value for nodata areas. Optional, default value is np.nan.
kwargs (additional keyword arguments for rasterio.features.rasterize.) – See: https://rasterio.readthedocs.io/en/stable/api/rasterio.features.html#rasterio.features.rasterize
- Returns
rasterized – Vector data rasterized. Matches shape and coordinates of
like
.- Return type
xarray.DataArray
-
imod.prepare.
rasterize_celltable
(table, column, like)[source]¶ Rasterizes a table, such as produced by
imod.prepare.spatial.celltable
. Before rasterization, multiple values should be grouped and aggregated per cell. Values will be overwritten otherwise.- Parameters
like (xr.DataArray) –
table (pandas.DataFrame) – with columns: “row_index”, “col_index”
column (str, int, float) – column name of values to rasterize
- Returns
rasterized
- Return type
xr.DataArray
-
imod.prepare.
reproject
(source, like=None, src_crs=None, dst_crs=None, method='nearest', use_src_attrs=False, src_nodata=nan, **reproject_kwargs)[source]¶ Reprojects and/or resamples a 2D xarray DataArray to a different cellsize or coordinate system.
To resample to a new cellsize in the same projection: provide only
like
.To only reproject: provide only
src_crs
andsrc_crs
.To reproject and resample to a specific domain: provide
src_crs
,src_crs
, andlike
.
Note: when only
like
is provided, Cartesian (projected) coordinates are a ssumed for resampling. In case of non-Cartesian coordinates, specifysrc_crs
anddst_crs
for correct resampling.- Parameters
source (xarray DataArray) – The DataArray to be resampled and/or reprojected. Must contain dimensions
y
andx
.like (xarray DataArray) – Example DataArray that shows what the resampled result should look like in terms of coordinates. Must contain dimensions
y
andx
.src_crs (string, dict, rasterio.crs.CRS) –
Coordinate system of
source
. Options:string: e.g.
"EPSG:4326"
rasterio.crs.CRS
dst_crs (string, dict, rasterio.crs.CRS) –
Coordinate system of result. Options:
string: e.g.
"EPSG:4326"
rasterio.crs.CRS
use_src_attrs (boolean) – If True: Use metadata in
source.attrs
, as generated byxarray.open_rasterio()
, to do reprojection.method (string) –
The method to use for resampling/reprojection. Defaults to “nearest”. GDAL methods are available:
nearest
bilinear
cubic
cubic_spline
lanczos
average
mode
gauss
max
min
med (50th percentile)
q1 (25th percentile)
q3 (75th percentile)
reproject_kwargs (dict, optional) – keyword arguments for
rasterio.warp.reproject()
.
- Returns
Resampled/reprojected DataArray.
- Return type
xarray.DataArray
Examples
Resample a DataArray
a
to a new cellsize, using an existing DataArrayb
:>>> c = imod.rasterio.reproject(source=a, like=b)
Resample a DataArray to a new cellsize of 100.0, by creating a
like
DataArray first: (Note that dy must be negative, as is usual for geospatial grids.)>>> dims = ("y", "x") >>> coords = {"y": np.arange(200_000.0, 100_000.0, -100.0), "x": np.arange(0.0, 100_000.0, 100.0)} >>> b = xr.DataArray(data=np.empty((200, 100)), coords=coords, dims=dims) >>> c = imod.rasterio.reproject(source=a, like=b)
Reproject a DataArray from one coordinate system (WGS84, EPSG:4326) to another (UTM30N, EPSG:32630):
>>> c = imod.rasterio.reproject(source=a, src_crs="EPSG:4326", dst_crs="EPSG:32630")
Get the reprojected DataArray in the desired shape and coordinates by providing
like
:>>> c = imod.rasterio.reproject(source=a, like=b, src_crs="EPSG:4326", dst_crs="EPSG:32630")
Open a single band raster, and reproject to RD new coordinate system (EPSG:28992), without explicitly specifying
src_crs
.src_crs
is taken froma.attrs
, so the raster file has to include coordinate system metadata for this to work.>>> a = xr.open_rasterio("example.tif").squeeze("band") >>> c = imod.rasterio.reproject(source=a, use_src_attrs=True, dst_crs="EPSG:28992")
In case of a rotated
source
, providesrc_transform
directly oruse_src_attrs=True
to rely on generated attributes:>>> rotated = xr.open_rasterio("rotated_example.tif").squeeze("band") >>> c = imod.rasterio.reproject(source=rotated, dst_crs="EPSG:28992", reproject_kwargs={"src_transform":affine.Affine(...)}) >>> c = imod.rasterio.reproject(source=rotated, dst_crs="EPSG:28992", use_src_attrs=True)