imod.prepare - Prepare model input

Prepare model input.

A various selection of functions to mangle your data from one form into another, such that it will fit into your model. This includes imod.prepare.reproject() for reprojecting grids, and imod.prepare.rasterize() to create grids from vector files.

Naturally you are free to write your own functions or one of the many existing ones from other packages. As long as you keep your data in the expected xarray.DataArray and pandas.DataFrame formats, this will work. In some cases, such as imod.prepare.Regridder, these methods are optimized for speed by making use of the Numba compiler, to be able to regrid large datasets.

class imod.prepare.Regridder(method, ndim_regrid=None, use_relative_weights=False)[source]

Bases: object

Object to repeatedly regrid similar objects. Compiles once on first call, can then be repeatedly called without JIT compilation overhead.

method

The method to use for regridding. Default available methods are: {"nearest", "multilinear", mean", "harmonic_mean", "geometric_mean", "sum", "minimum", "maximum", "mode", "median", "conductance"}

Type

str, function

ndim_regrid

The number of dimensions over which to regrid. If not provided, ndim_regrid will be inferred. It serves to prevent regridding over an unexpected number of dimensions; say you want to regrid over only two dimensions. Due to an input error in the coordinates of like, three dimensions may be inferred in the first .regrid call. An error will be raised if ndim_regrid not match the number of inferred dimensions. Default value is None.

Type

int, optional

use_relative_weights

Whether to use relative weights in the regridding method or not. Relative weights are defined as: cell_overlap / source_cellsize, for every axis.

This argument should only be used if you are providing your own method as a function, where the function requires relative, rather than absolute weights (the provided conductance method requires relative weights, for example). Default value is False.

Type

bool, optional

Examples

Initialize the Regridder object:

>>> mean_regridder = imod.prepare.Regridder(method="mean")

Then call the regrid method to regrid.

>>> result = mean_regridder.regrid(source, like)

The regridder can be re-used if the number of regridding dimensions match, saving some time by not (re)compiling the regridding method.

>>> second_result = mean_regridder.regrid(second_source, like)

A one-liner is possible for single use:

>>> result = imod.prepare.Regridder(method="mean").regrid(source, like)

It’s possible to provide your own methods to the Regridder, provided that numba can compile them. They need to take the arguments values and weights. Make sure they deal with nan values gracefully!

>>> def p30(values, weights):
>>>     return np.nanpercentile(values, 30)
>>> p30_regridder = imod.prepare.Regridder(method=p30)
>>> p30_result = p30_regridder.regrid(source, like)

The Numba developers maintain a list of support Numpy features here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

In general, however, the provided methods should be adequate for your regridding needs.

regrid(source, like, fill_value=nan)[source]

Regrid source along dimensions that source and like share. These dimensions will be inferred the first time .regrid is called for the Regridder object.

Following xarray conventions, nodata is assumed to np.nan.

Parameters
  • source (xr.DataArray of floats) –

  • like (xr.DataArray of floats) – The like array present what the coordinates should look like.

  • fill_value (float) – The fill_value. Defaults to np.nan

Returns

result – Regridded result.

Return type

xr.DataArray

class imod.prepare.Voxelizer(method, use_relative_weights=False)[source]

Bases: object

Object to repeatedly voxelize similar objects. Compiles once on first call, can then be repeatedly called without JIT compilation overhead.

method

The method to use for regridding. Default available methods are: {"mean", "harmonic_mean", "geometric_mean", "sum", "minimum", "maximum", "mode", "median", "max_overlap"}

Type

str, function

Examples

Usage is similar to the regridding. Initialize the Voxelizer object:

>>> mean_voxelizer = imod.prepare.Voxelizer(method="mean")

Then call the voxelize method to transform a layered dataset into a voxel based one. The vertical coordinates of the layers must be provided by top and bottom.

>>> mean_voxelizer.voxelize(source, top, bottom, like)

If your data is already voxel based, i.e. the layers have tops and bottoms that do not differ with x or y, you should use a Regridder instead.

It’s possible to provide your own methods to the Regridder, provided that numba can compile them. They need to take the arguments values and weights. Make sure they deal with nan values gracefully!

>>> def p30(values, weights):
>>>     return np.nanpercentile(values, 30)
>>> p30_voxelizer = imod.prepare.Voxelizer(method=p30)
>>> p30_result = p30_voxelizer.regrid(source, top, bottom, like)

The Numba developers maintain a list of support Numpy features here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

In general, however, the provided methods should be adequate for your voxelizing needs.

voxelize(source, top, bottom, like)[source]
Parameters
  • source (xr.DataArray) – The values of the layered model.

  • top (xr.DataArray) – The vertical location of the layer tops.

  • bottom (xr.DataArray) – The vertical location of the layer bottoms.

  • like (xr.DataArray) – An example DataArray providing the coordinates of the voxelized results; what it should look like in terms of dimensions, data type, and coordinates.

Returns

voxelized

Return type

xr.DataArray

imod.prepare.celltable(path, column, resolution, like, chunksize=10000.0)[source]

Process area of features by rasterizing in a chunkwise to limit memory usage.

Returns a table of cell indices (row, column) with for example feature ID, and feature area within cell. Essentially returns a COO sparse matrix, but with duplicate values per cell, since more than one geometry may be present.

The feature area within the cell is approximated by first rasterizing the feature, and then counting the number of occuring cells. This means the accuracy of the area depends on the cellsize of the rasterization step.

A celltable is returned, as a pandas.DataFrame. It has the following columns:

  1. "row_index"

  2. "col_index"

  3. the value of the column argument

  4. "area"

"row_index" and "col_index" are the indices of the like array in which the polygon is located. The column value holds the rasterized value of the specified column. "area" contains the area of the polygon within the cell.

The most convenient way of using this celltable is by specifying a feature ID as column. After creating a celltable, pandas.DataFrame.merge() can be used to join additional data on this ID. Refer to the examples.

Parameters
  • path (str or pathlib.Path) – path to OGR supported vector file (e.g. a shapefile)

  • column (str) – column name of column to burn into raster

  • resolution (float) – cellsize at which the rasterization, and determination of area within cellsize occurs. Very small values are recommended (e.g. <= 0.5 m).

  • like (xarray.DataArray) – Example DataArray of where the cells will be located. Used only for the coordinates.

  • chunksize (int, optional) – The size of the chunksize. Used for both x and y dimension.

Returns

celltable

Return type

pandas.DataFrame

Examples

Assume we have a shapefile called waterways.shp and information on the model discretization is described by a like DataArray. The feature ID is provided by a column in the shapefile called “ID-code”. Additionally, this shapefile also specifies bed hydraulic resistance (c0). For this specific discretization, we wish to calculate a conductance (area divided by hydraulic resistance). To do so, we:

  1. create a celltable

  2. join the additional attributes (such as c0)

  3. compute the conductance per feature

  4. sum conductances per cell

Import the required packages.

>>> import imod
>>> import geopandas as gpd

Generate the celltable.

>>> celltable = imod.prepare.celltable(
        path="waterways.shp",
        column="ID-code",
        resolution=0.5,
        like=like,
    )

Load the shapefile with geopandas into a GeoDataFrame.

>>> gdf = gpd.read_file("waterways.shp)

Select the relevant columns into a pandas.DataFrame and merge with the celltable.

>>> df = gdf[["ID-code", "c0"]]
>>> joined = celltable.merge(gdf, on="ID-code")

We compute the conductance, and sum it per cell using pandas methods:

>>> joined["conductance"] = joined["area"] / joined["c0"]
>>> summed_conductance = joined.groupby(["row_index", "col_index"], as_index=False)[
        "conductance"
    ].sum()

Finally, turn the result into a DataArray so it can be used as model input:

>>> conductance = imod.prepare.rasterize_celltable(
        table=summed_conductance,
        column="conductance",
        like=like,
    )
imod.prepare.fill(da, invalid=None, by=None)[source]

Replace the value of invalid da cells (indicated by invalid) using basic nearest neighbour interpolation.

Parameters
  • da (xr.DataArray with gaps) – array containing missing value if one of the dimensions is layer, it will interpolate one layer at a a time (2D interpolation over x and y in case of dims == (by, “y”, “x”)).

  • invalid (xr.DataArray) – a binary array of same shape as da. data value are replaced where invalid is True If None (default), uses: invalid = np.isnan(data)

Returns

with the same coordinates as the input.

Return type

xarray.DataArray

imod.prepare.gdal_rasterize(path, column, like=None, nodata=None, dtype=None, spatial_reference=None)[source]

Use GDAL to rasterize a vector file into an xarray.DataArray.

Can be significantly more efficient than rasterize. This doesn’t load the vector data into a GeoDataFrame and loops over the individual shapely geometries like rasterio.rasterize does, but loops over the features within GDAL instead.

Parameters
  • path (str or pathlib.Path) – path to OGR supported vector file (e.g. a shapefile)

  • column (str) – column name of column to burn into raster

  • like (xr.DataArray, optional) – example of raster

  • nodata (int, float; optional) –

  • dtype (numpy.dtype, optional) –

  • spatial_reference (dict, optional) –

    Optional dict to avoid allocating the like DataArray. Used if template is None. Dict has keys “bounds” and “cellsizes”, with:

    • bounds = (xmin, xmax, ymin, ymax)

    • cellsizes = (dx, dy)

Returns

rasterized

Return type

np.array

imod.prepare.laplace_interpolate(source, ibound=None, close=0.01, mxiter=5, iter1=50, relax=0.98)[source]

Fills gaps in source by interpolating from existing values using Laplace interpolation.

Parameters
  • source (xr.DataArray with dims (y, x)) – Data values to interpolate.

  • ibound (xr.DataArray with dims (y, x)) – Precomputed array which marks where to interpolate

  • close (float) – Closure criteration of iterative solver. Should be one to two orders of magnitude smaller than desired accuracy.

  • mxiter (int) – Outer iterations of iterative solver.

  • iter1 (int) – Inner iterations of iterative solver. Should not exceed 50.

  • relax (float) – Iterative solver relaxation parameter. Should be between 0 and 1.

Returns

interpolated – source, with interpolated values where ibound equals 1

Return type

xr.DataArray with dims (y, x)

imod.prepare.rasterize(geodataframe, like, column=None, fill=nan, **kwargs)[source]

Rasterize a geopandas GeoDataFrame onto the given xarray coordinates.

Parameters
  • geodataframe (geopandas.GeoDataFrame) –

  • column (str, int, float) – column name of geodataframe to burn into raster

  • like (xarray.DataArray) – Example DataArray. The rasterized result will match the shape and coordinates of this DataArray.

  • fill (float, int) – Fill value for nodata areas. Optional, default value is np.nan.

  • kwargs (additional keyword arguments for rasterio.features.rasterize.) – See: https://rasterio.readthedocs.io/en/stable/api/rasterio.features.html#rasterio.features.rasterize

Returns

rasterized – Vector data rasterized. Matches shape and coordinates of like.

Return type

xarray.DataArray

imod.prepare.rasterize_celltable(table, column, like)[source]

Rasterizes a table, such as produced by imod.prepare.spatial.celltable. Before rasterization, multiple values should be grouped and aggregated per cell. Values will be overwritten otherwise.

Parameters
  • like (xr.DataArray) –

  • table (pandas.DataFrame) – with columns: “row_index”, “col_index”

  • column (str, int, float) – column name of values to rasterize

Returns

rasterized

Return type

xr.DataArray

imod.prepare.reproject(source, like=None, src_crs=None, dst_crs=None, method='nearest', use_src_attrs=False, src_nodata=nan, **reproject_kwargs)[source]

Reprojects and/or resamples a 2D xarray DataArray to a different cellsize or coordinate system.

  • To resample to a new cellsize in the same projection: provide only like.

  • To only reproject: provide only src_crs and src_crs.

  • To reproject and resample to a specific domain: provide src_crs, src_crs, and like.

Note: when only like is provided, Cartesian (projected) coordinates are a ssumed for resampling. In case of non-Cartesian coordinates, specify src_crs and dst_crs for correct resampling.

Parameters
  • source (xarray DataArray) – The DataArray to be resampled and/or reprojected. Must contain dimensions y and x.

  • like (xarray DataArray) – Example DataArray that shows what the resampled result should look like in terms of coordinates. Must contain dimensions y and x.

  • src_crs (string, dict, rasterio.crs.CRS) –

    Coordinate system of source. Options:

    • string: e.g. "EPSG:4326"

    • rasterio.crs.CRS

  • dst_crs (string, dict, rasterio.crs.CRS) –

    Coordinate system of result. Options:

    • string: e.g. "EPSG:4326"

    • rasterio.crs.CRS

  • use_src_attrs (boolean) – If True: Use metadata in source.attrs, as generated by xarray.open_rasterio(), to do reprojection.

  • method (string) –

    The method to use for resampling/reprojection. Defaults to “nearest”. GDAL methods are available:

    • nearest

    • bilinear

    • cubic

    • cubic_spline

    • lanczos

    • average

    • mode

    • gauss

    • max

    • min

    • med (50th percentile)

    • q1 (25th percentile)

    • q3 (75th percentile)

  • reproject_kwargs (dict, optional) – keyword arguments for rasterio.warp.reproject().

Returns

Resampled/reprojected DataArray.

Return type

xarray.DataArray

Examples

Resample a DataArray a to a new cellsize, using an existing DataArray b:

>>> c = imod.rasterio.reproject(source=a, like=b)

Resample a DataArray to a new cellsize of 100.0, by creating a like DataArray first: (Note that dy must be negative, as is usual for geospatial grids.)

>>> dims = ("y", "x")
>>> coords = {"y": np.arange(200_000.0, 100_000.0, -100.0), "x": np.arange(0.0, 100_000.0, 100.0)}
>>> b = xr.DataArray(data=np.empty((200, 100)), coords=coords, dims=dims)
>>> c = imod.rasterio.reproject(source=a, like=b)

Reproject a DataArray from one coordinate system (WGS84, EPSG:4326) to another (UTM30N, EPSG:32630):

>>> c = imod.rasterio.reproject(source=a, src_crs="EPSG:4326", dst_crs="EPSG:32630")

Get the reprojected DataArray in the desired shape and coordinates by providing like:

>>> c = imod.rasterio.reproject(source=a, like=b, src_crs="EPSG:4326", dst_crs="EPSG:32630")

Open a single band raster, and reproject to RD new coordinate system (EPSG:28992), without explicitly specifying src_crs. src_crs is taken from a.attrs, so the raster file has to include coordinate system metadata for this to work.

>>> a = xr.open_rasterio("example.tif").squeeze("band")
>>> c = imod.rasterio.reproject(source=a, use_src_attrs=True, dst_crs="EPSG:28992")

In case of a rotated source, provide src_transform directly or use_src_attrs=True to rely on generated attributes:

>>> rotated = xr.open_rasterio("rotated_example.tif").squeeze("band")
>>> c = imod.rasterio.reproject(source=rotated, dst_crs="EPSG:28992", reproject_kwargs={"src_transform":affine.Affine(...)})
>>> c = imod.rasterio.reproject(source=rotated, dst_crs="EPSG:28992", use_src_attrs=True)