API Reference

This page contains auto-generated API reference documentation [1].

subsettools

Subpackages

subsettools.template_runscripts

Submodules

Functions

`define_huc_domain`(hucs, grid[, huc_version])	Define a domain by a collection of HUCs.
`define_latlon_domain`(latlon_bounds, grid)	Define a domain by latitude/longitude bounds.
`define_upstream_domain`(outlets, grid)	Define a domain that is the upstream area of the points in outlets.
`write_mask_solid`(mask, grid, write_dir[, mode, ij_bounds])	Create ParFlow mask and solid files from a mask array.
`huc_to_ij`(huc_list, grid[, huc_version])	This function is deprecated.
`latlon_to_ij`(latlon_bounds, grid)	This function is deprecated.
`create_mask_solid`(huc_list, grid, write_dir[, huc_version])	This function is deprecated.
`subset_static`(ij_bounds, dataset, write_dir[, var_list])	Subset static input files from national datasets in HydroData.
`subset_press_init`(ij_bounds, dataset, date, write_dir)	Subset a pressure file from a national dataset in HydroData.
`subset_forcing`(ij_bounds, grid, start, end, dataset, ...)	Subset forcing files from national datasets in HydroData.
`config_clm`(ij_bounds, start, end, dataset, write_dir)	Modify template CLM driver files for a desired subdomain and run duration.
`vegm_to_land_cover`(vegm_path[, write_pfb_path])	Convert a vegm.dat file in CLM format into a land cover array.
`get_template_runscript`(grid, mode, input_file_type, ...)	Get a ParFlow template runscript.
`edit_runscript_for_subset`(ij_bounds, runscript_path[, ...])	Modify a ParFlow run script for a new subdomain run.
`copy_files`(read_dir, write_dir)	Copy all files from read_dir to write_dir.
`change_filename_values`(runscript_path[, write_dir, ...])	Change the filenames of input files in the ParFlow runscript.
`dist_run`(topo_p, topo_q, runscript_path[, ...])	Distribute ParFlow input files for parallel computing.

subsettools.define_huc_domain(hucs, grid, huc_version=None)[source]

Define a domain by a collection of HUCs.

The domain is defined by the grid ij bounds of a bounding box that encompasses the HUCs in the list and a mask for that bounding box indicating which cells in the bounding box are part of these HUCs.

All HUC IDs in hucs must be the same length (HUCs of the same level). All HUCs should be adjacent. If a HUC is only partially covered by the provided grid, the grid bounds for the covered area will be returned.

Parameters:

hucs (list[str]) – a list of USGS HUC IDs
grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”
huc_version (str) – The dataset_version of the huc_mapping dataset to use to lookup HUC ids.

Returns:

A tuple (bounds, mask).

Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the area defined by the HUC IDs in hucs. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain.

Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are part of the selected HUC(s).

Raises:

ValueError – If the area defined by the provided HUCs is not part of the given grid.

Example:

grid_bounds, mask = define_huc_domain(
    hucs=["14080201", "14080202", "14080203"], grid="conus1"
)

subsettools.define_latlon_domain(latlon_bounds, grid)[source]

Define a domain by latitude/longitude bounds.

The domain is defined by the grid ij bounds of a bounding box formed by the latitude/longitude bounds (latlon_bounds) relative to the selected conus grid and a mask for that bounding box indicating which cells are active CONUS points.

Parameters:

latlon_bounds (List[List[float]]) – list of the form [[lat1, lon1], [lat2, lon2]]. [lat1, lon1] and [lat2, lon2] define the northwest and southeast corners of the desired box respectively.
grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”.

Returns:

A tuple (bounds, mask).

Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the area defined by the latlon_bounds. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain.

Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are active CONUS points (for example, if ocean is part of the bounding box the corresponding cells will not be part of the mask).

Example:

grid_bounds, mask = define_latlon_domain(
    latlon_bounds=[[37.91, -91.43], [37.34, -90.63]], grid="conus2"
)

subsettools.define_upstream_domain(outlets, grid)[source]

Define a domain that is the upstream area of the points in outlets.

The domain is defined by the grid ij bounds of the bounding box that encompasses the upstream area of all the points in outlets and a mask for that bounding box indicating which cells are part of the selected area.

The flow_direction files that are used to define the upstream area follow the convention: down: 1, left: 2, up: 3, right: 4.

Parameters:

outlets (List[List[float]]) – list of lat-lon points of the form [[lat1, lon1], [lat2, lon2], …]
grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”

Returns:

A tuple (bounds, mask).

Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the upstream area of the outlets. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain.

Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are part of the computed upstream area of the outlets.

Raises:

ValueError – If the computed upstream area of the outlets is empty.

Example:

bounds, mask = define_upstream_domain(
    outlets=[[44.1348, -95.5084], [44.1352, -95.4949]],
    grid="conus2"
)

subsettools.write_mask_solid(mask, grid, write_dir, mode='single-mask', ij_bounds=None)[source]

Create ParFlow mask and solid files from a mask array.

Given an integer mask array consisting of 0s and 1s, this function will create three files in write_dir.

a 2D mask file that indicates which cells inside the box domain are part of the selected HUCS.

a solid file that defines a 3D domain extending to the depth of whichever grid has been selected and tracing the boundaries of the selected HUCS.

a vtk file, which can be used to visualize the solid file in ParaView.

If the mode is ‘multi-mask’, another six masks will be written for the top, bottom, left, right, front and back masks for each cell in the domain.

Parameters:

mask (numpy.ndarray) – an integer array such that mask[i, j] == 1 if the cell (i, j) is part of the domain, and mask[i, j] == 0 otherwise.
grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”
write_dir (str) – directory path where the mask and solid files will be written
mode (str) – This is the mode that the pfmask-to-pfsol script will be run. It can be either ‘single-mask’ or ‘multi-mask’. Currently, ‘multi-mask’ mode is only supported for the CONUS2 grid.
ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. This is only necessary if mode is ‘multi-mask’.

Returns:

A dictionary mapping the keys (“mask”, “mask_vtk”, “solid”) to the: corresponding filepaths of the created files. If the mode is ‘multi-mask’ the dictionary will contain additional keys for the side masks for each cell in the domain.

Return type:

dict

Example:

filepaths = write_mask_solid(
    mask=np.array([[0, 1], [1, 1]]),
    grid="conus2",
    write_dir="/path/to/your/chosen/directory"
)

subsettools.huc_to_ij(huc_list, grid, huc_version=None)[source]

This function is deprecated.

Use define_huc_domain() instead.

subsettools.latlon_to_ij(latlon_bounds, grid)[source]

This function is deprecated.

Use define_latlon_domain() instead.

subsettools.create_mask_solid(huc_list, grid, write_dir, huc_version=None)[source]

This function is deprecated.

Use write_mask_solid() instead.

subsettools.subset_static(ij_bounds, dataset, write_dir, var_list=('slope_x', 'slope_y', 'pf_indicator', 'mannings', 'pf_flowbarrier', 'pme', 'ss_pressure_head'))[source]

Subset static input files from national datasets in HydroData.

The subset values will be written as ParFlow binary files (pfbs) in write_dir. By default the following variables will be subset.

Slope in the east/west direction (slope_x)

Slope in the north/south direction (slope_y)

Subsurface units indicator file (pf_indicator)

Mannings roughness coefficients (mannings)

Depth to bedrock (pf_flowbarrier)

Long term average precipitation minus evaporation (i.e. recharge) (pme)

Steady state pressure head used to initialize transient simulations (ss_pressure_head)

Note that some datasets might not contain all 7 static input variables. In that case, the subset_static function is going to raise a ValueError for any variables that do not exist in the dataset. The default variable list contains the necessary static variables for the CONUS2 grid. For CONUS1-based datasets, “mannings” and “pf_flowbarrier” should be removed from the list.

Parameters:

ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset.
dataset (str) – static inputs dataset name from the HydroData catalog e.g. “conus1_domain”
write_dir (str) – directory where the subset files will be written
var_list (tuple[str]) – tuple of variables to subset from the dataset. By default all 7 variables above will be subset. The user can specify a subset of these variables or list additional variables that are available in their dataset of choice.

Returns:

A dictionary mapping the static variable names to the corresponding file paths where the subset data were written.

Example:

# Subsetting static variables for a CONUS1 workflow
# We need to remove "pf_flowbarrier" and "mannings" from the list
filepaths = subset_static(
    ij_bounds=(375, 239, 487, 329),
    dataset="conus1_domain",
    write_dir="/path/to/your/chosen/directory",
    var_list=("slope_x", "slope_y", "pf_indicator", "pme",
              "ss_pressure_head")
)

# Subsetting static variables for a CONUS2 workflow
# Note that we can use the default var_list here
filepaths = subset_static(
    ij_bounds=(3701, 1544, 3792, 1633),
    dataset="conus2_domain",
    write_dir="/path/to/your/chosen/directory",
)

subsettools.subset_press_init(ij_bounds, dataset, date, write_dir, time_zone='UTC')[source]

Subset a pressure file from a national dataset in HydroData.

This function will select the pressure file for midnight on the date provided and subset the selected pressure file to the ij_bounds provided. The subset data will be written out as a ParFlow binary file (pfb) to be used as an initial pressure file for a ParFlow simulation.

Parameters:

ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset.
dataset (str) – dataset name from the HydroData catalog that the pressure file will be subset from e.g. “conus1_baseline_mod”
date (str) – The date of the pressure file that you would like to subset, in the form ‘yyyy-mm-dd’
write_dir (str) – directory where the subset file will be written
time_zone (str) – timezone information for subset date. Data will be subset at midnight in the specified timezone. This should be a zoneinfo-supported time zone. Defaults to “UTC”.

Returns:

The filepath of the subset file, which includes datetime information, so that it can be used by later functions (e.g. edit_runscript_for_subset).

Example:

filepath = subset_press_init(
    ij_bounds=(375, 239, 487, 329),
    dataset="conus1_baseline_mod",
    date="2005-12-15",
    write_dir="/path/to/your/chosen/directory",
    time_zone="EST"
)

subsettools.subset_forcing(ij_bounds, grid, start, end, dataset, write_dir, time_zone='UTC', forcing_vars=('precipitation', 'downward_shortwave', 'downward_longwave', 'specific_humidity', 'air_temp', 'atmospheric_pressure', 'east_windspeed', 'north_windspeed'), dataset_version=None)[source]

Subset forcing files from national datasets in HydroData.

Subset forcing data will be written out as pfb files formatted for a ParFlow run with 24 hours per forcing file. Per ParFlow-CLM convention separate files will be written for each variable following the standard clm variable naming convention.

Forcing file outputs will be numbered starting with 0000 and data will start at midnight local time for the timezone that has been provided. If no timezone is provided it will default to midnight UTC.

Parameters:

ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset.
grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”
start (str) – start date (inclusive), in the form ‘yyyy-mm-dd’
end (str) – end date (exlusive), in the form ‘yyyy-mm-dd’
dataset (str) – forcing dataset name from the HydroData catalog that the forcing files will be subset from e.g. “NLDAS2”.
write_dir (str) – directory where the subset files will be written
time_zone (str) – timezone information for start and end dates. Data will be subset starting at midnight in the specified timezone. This should be a zoneinfo-supported time zone. Defaults to “UTC”.
forcing_vars (tuple[str]) – tuple of forcing variables to subset. By default all 8 variables needed to run ParFlow-CLM will be subset.
dataset_version (str) – version of the forcing dataset. By default the latest version of a dataset will be returned.

Returns:

A dictionary mapping the forcing variable names to the corresponding file paths where the subset data were written.

Example:

filepaths = subset_forcing(
    ij_bounds=(1225, 1738, 1347, 1811),
    grid="conus2",
    start="2005-11-01",
    end="2005-12-01",
    dataset="CW3E",
    write_dir="/path/to/your/chosen/directory",
    forcing_vars=("precipitation", "air_temp"),
    dataset_version="0.9",
)

subsettools.config_clm(ij_bounds, start, end, dataset, write_dir, time_zone='UTC')[source]

Modify template CLM driver files for a desired subdomain and run duration.

This function will obtain template clm driver files (specifically vegm, vep and drv_clmin) from the existing national simulations on HydroData and modify them to reflect the desired subdomain (indicated by the ij_bounds) and run duration (indicated by the start and end dates). The modified files will be written out to a user specified directory. These files are required if you are going to run a ParFlow-CLM simulation.

Parameters:

ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset.
start (str) – start date (inclusive), in the form ‘yyyy-mm-dd’
end (str) – end date (exlusive), in the form ‘yyyy-mm-dd’
dataset (str) – the dataset that the files should be obtained from name e.g. “conus1_baseline_mod”
write_dir (str) – directory where the subset files will be written
time_zone (str) – timezone information for start and end dates. This should be a zoneinfo-supported time zone. Defaults to “UTC”.

Returns:

A dictionary mapping the CLM file types (“vegp”, “vegm”, “drv_clm”) to the corresponging filepaths where the CLM files were written.

Example:

filepaths = config_clm(
    ij_bounds=(375, 239, 487, 329),
    start="2005-10-01",
    end="2006-10-01",
    dataset="conus1_baseline_mod",
    write_dir="/path/to/your/chosen/directory"
)

subsettools.vegm_to_land_cover(vegm_path, write_pfb_path=None)[source]

Convert a vegm.dat file in CLM format into a land cover array.

This function assumes the vegm.dat file is in the standard format. That is, the file has 1 row per grid cell and each row contains 25 columns The columns are ordered as: x, y, lat, lon, sand, clay, color, then 18 columns representing the fractional coverage of the grid cell by vegetation class (these final 18 columns add to 1.0 for each row). The rows are in ascending order by grid cell index with y as the outer loop and x as the inner loop.

In cases in which the fractional vegetation coverage results in a tie between multiple vegetation classes, the final land cover array will use the first (lowest) land cover designation to break the tie (ie. the land cover array will contain designation 1 for a grid cell in which the vegetation distribution is 0.5 class 1 and 0.5 class 5).

Parameters:

vegm_path (str) – path to vegm file
write_pfb_path (str; optional) – path to write output .pfb file to disk

Returns:

NumPy array containing the calculated land cover type for each domain grid cell.

If pfb_path is provided, .pfb file is written to disk at the specified path.

Example:

land_cover_array = vegm_to_land_cover("/path/to/vegm/vegm.dat")

subsettools.get_template_runscript(grid, mode, input_file_type, write_dir)[source]

Get a ParFlow template runscript.

The runscript is selected based on the grid, mode and input file type and is copied to write_dir.

Parameters:

grid (str) – The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”
mode (str) – The type of simulation you would like to do. Possible values: “spinup” (run ParFlow with a constant recharge forcing at the upper boundary) and “transient” (coupled ParFlow-CLM run)
input_file_type (str) – The type of domain you will run. Possible values: “box” or “solid”
write_dir (str) – directory where the template runscript file will be copied

Returns:

A path to the template runscript.

Example:

runscript_path = get_template_runscript(
    grid="conus1",
    mode="spinup",
    input_file_type="solid",
    write_dir="/path/to/your/chosen/directory"
)

subsettools.edit_runscript_for_subset(ij_bounds, runscript_path, write_dir=None, runname=None, forcing_dir=None)[source]

Modify a ParFlow run script for a new subdomain run.

This function is designed to start from a national ParFlow runscript template and perform the following three modifications.

Modify the geometry to reflect the bounds of the desired ij_bounds (i.e. the number of grid cells in the x and y direction and the upper bounds of the geometry)

Update the runname to for the desired new run.

Update the location of the climate forcings for the new run.

If the runname is None and write_dir is the directory containing the runscript file, the runscript file will be overwritten.

Parameters:

ij_bounds (tuple[int]) – bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset.
runscript_path (str) – absolute path to the template parflow runscript file
write_dir (str) – directory where the new template file will be written. If it is None, defaults to the directory containing the runscript.
runname (str) – name for the new parflow run. If it is None, defaults to the runscript’s previous runname.
forcing_dir (str) – path to the directory containing the subset forcing files. If it is None, defaults to the runscript’s previous forcing directory path.

Returns:

A path to the new runscript file that will be created.

Example:

runscript_path = edit_runscript_for_subset(
    ij_bounds=(375, 239, 487, 329),
    runscript_path="/path/to/your/original/runscript",
    runname="my_conus1_run",
    forcing_dir="/path/to/your/forcing/directory"
)

subsettools.copy_files(read_dir, write_dir)[source]

Copy all files from read_dir to write_dir.

Parameters:

read_dir (str) – read-from directory path
write_dir (str) – write-to directory path

Example:

copy_files(
    read_dir="/path/to/read-from/directory",
    write_dir="/path/to/write-to/directory"
)

subsettools.change_filename_values(runscript_path, write_dir=None, runname=None, slopex=None, slopey=None, solidfile=None, init_press=None, indicator=None, depth_to_bedrock=None, mannings=None, evap_trans=None)[source]

Change the filenames of input files in the ParFlow runscript.

This function will update the paths to input files in a ParFlow runscript. The provided arguments will reset the corresponding parflow keys to match the user specified paths to input files. File names can be specified with or without relative or absolute file paths. If no path is provided ParFlow will expect the input files to be present in the run directory at the time of simulation.

Note that this will only change paths for keys that already exist in the template ParFlow run script you are starting from and will not reconfigure a run to use new keys (for example if you are not starting from a run script that uses a solid file, adding a new solid file path will not configure the run to use a solid file).

Refer to the ParFlow manual for additional information on any of the keys listed above.

If the runname is None and write_dir is the directory containing the runscript file, the runscript file will be overwritten.

Parameters:

runscript_path (str) – path to the runscript file (yaml or pfidb)
write_dir (str) – directory where the new template file will be written. If it is None, defaults to the directory containing the runscript file.
runname (str) – name of the new parflow run
slopex (str) – new slopex filename (and path)
slopey (str) – new slopey filename (and path)
solidfile (str) – new solidfile filename (and path)
init_press (str) – new initial pressure filename (and path)
indicator (str) – new indicator input filename (and path)
depth_to_bedrock (str) – new depth to bedrock filename (and path)
mannings (str) – new mannings filename (and path)
evap_trans (str) – new evapotranspiration filename (and path)

Returns:

A path to the new runscript file that will be created.

Example:

runscript_path = change_filename_values(
    runscript_path="/path/to/your/original/runscript",
    runname="my_conus1_run",
    init_press="/filename/of/initial/pressure/pfb/file"
)

subsettools.dist_run(topo_p, topo_q, runscript_path, working_dir=None, dist_clim_forcing=True)[source]

Distribute ParFlow input files for parallel computing.

This function will distribute input files to topo_p grids in the x direction and topo_q grids in the y direction. If dist_clim_forcing is true, forcing files will be distributed as well according to the same topology. If working_dir is different that the directory containing the runscript file, the edited runscipt file will be written to working_dir.

Parameters:

topo_p (int) – number of grids (processes) to create in the x direction
topo_q (int) – number of grids (processes) to create in the y direction
runscript_path (str) – path to the runscript file (yaml or pfidb)
working_dir (str) – directory containing the files to be distributed. If it is None, it defaults to the directory containing the runscript file.
dist_clim_forcing (bool) – if true, distribute forcing files

Returns:

Path to the edited runscript file that will be created.

Return type:

str

Example:

runscript_path = dist_run(
    topo_p=2,
    topo_q=2,
    runscript_path="/path/to/your/original/runscript",
    dist_clim_forcing=False
)