subsettools --------------------- .. py:module:: subsettools Subpackages ~~~~~~~~~~~ .. toctree:: :titlesonly: :maxdepth: 1 autoapi/subsettools/template_runscripts/index.rst Submodules ~~~~~~~~~~ .. toctree:: :titlesonly: :maxdepth: 1 autoapi/subsettools/clm/index.rst autoapi/subsettools/domain/index.rst autoapi/subsettools/parflow_run/index.rst autoapi/subsettools/subsetting/index.rst Functions ~~~~~~~~ .. autoapisummary:: subsettools.define_huc_domain subsettools.define_latlon_domain subsettools.define_upstream_domain subsettools.write_mask_solid subsettools.huc_to_ij subsettools.latlon_to_ij subsettools.create_mask_solid subsettools.subset_static subsettools.subset_press_init subsettools.subset_forcing subsettools.config_clm subsettools.vegm_to_land_cover subsettools.get_template_runscript subsettools.edit_runscript_for_subset subsettools.copy_files subsettools.change_filename_values subsettools.dist_run .. py:function:: define_huc_domain(hucs, grid) Define a domain by a collection of HUCs. The domain is defined by the grid ij bounds of a bounding box that encompasses the HUCs in the list and a mask for that bounding box indicating which cells in the bounding box are part of these HUCs. All HUC IDs in hucs must be the same length (HUCs of the same level). All HUCs should be adjacent. If a HUC is only partially covered by the provided grid, the grid bounds for the covered area will be returned. :param hucs: a list of USGS HUC IDs :type hucs: list[str] :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2” :type grid: str :returns: A tuple (bounds, mask). Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the area defined by the HUC IDs in hucs. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain. Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are part of the selected HUC(s). :raises ValueError: If the area defined by the provided HUCs is not part of the given grid. Example: .. code-block:: python grid_bounds, mask = define_huc_domain( hucs=["14080201", "14080202", "14080203"], grid="conus1" ) .. py:function:: define_latlon_domain(latlon_bounds, grid) Define a domain by latitude/longitude bounds. The domain is defined by the grid ij bounds of a bounding box formed by the latitude/longitude bounds (latlon_bounds) relative to the selected conus grid and a mask for that bounding box indicating which cells are active CONUS points. :param latlon_bounds: list of the form [[lat1, lon1], [lat2, lon2]]. [lat1, lon1] and [lat2, lon2] define the northwest and southeast corners of the desired box respectively. :type latlon_bounds: List[List[float]] :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2”. :type grid: str :returns: A tuple (bounds, mask). Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the area defined by the latlon_bounds. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain. Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are active CONUS points (for example, if ocean is part of the bounding box the corresponding cells will not be part of the mask). Example: .. code-block:: python grid_bounds, mask = define_latlon_domain( latlon_bounds=[[37.91, -91.43], [37.34, -90.63]], grid="conus2" ) .. py:function:: define_upstream_domain(outlets, grid) Define a domain that is the upstream area of the points in outlets. The domain is defined by the grid ij bounds of the bounding box that encompasses the upstream area of all the points in outlets and a mask for that bounding box indicating which cells are part of the selected area. The flow_direction files that are used to define the upstream area follow the convention: down: 1, left: 2, up: 3, right: 4. :param outlets: list of lat-lon points of the form [[lat1, lon1], [lat2, lon2], ...] :type outlets: List[List[float]] :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2” :type grid: str :returns: A tuple (bounds, mask). Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the bounds in the conus grid of the upstream area of the outlets. imin, jmin, imax, jmax are the west, south, east and north sides of the box respectively and all i,j indices are calculated relative to the lower southwest corner of the domain. Mask is a 2D numpy.ndarray that indicates which cells inside the bounding box are part of the computed upstream area of the outlets. :raises ValueError: If the computed upstream area of the outlets is empty. Example: .. code-block:: python bounds, mask = define_upstream_domain( outlets=[[44.1348, -95.5084], [44.1352, -95.4949]], grid="conus2" ) .. py:function:: write_mask_solid(mask, grid, write_dir, mode='single-mask', ij_bounds=None) Create ParFlow mask and solid files from a mask array. Given an integer mask array consisting of 0s and 1s, this function will create three files in write_dir. - a 2D mask file that indicates which cells inside the box domain are part of the selected HUCS. - a solid file that defines a 3D domain extending to the depth of whichever grid has been selected and tracing the boundaries of the selected HUCS. - a vtk file, which can be used to visualize the solid file in ParaView. If the mode is 'multi-mask', another six masks will be written for the top, bottom, left, right, front and back masks for each cell in the domain. :param mask: an integer array such that mask[i, j] == 1 if the cell (i, j) is part of the domain, and mask[i, j] == 0 otherwise. :type mask: numpy.ndarray :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2” :type grid: str :param write_dir: directory path where the mask and solid files will be written :type write_dir: str :param mode: This is the mode that the pfmask-to-pfsol script will be run. It can be either 'single-mask' or 'multi-mask'. Currently, 'multi-mask' mode is only supported for the CONUS2 grid. :type mode: str :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. This is only necessary if mode is 'multi-mask'. :type ij_bounds: tuple[int] :returns: A dictionary mapping the keys ("mask", "mask_vtk", "solid") to the corresponding filepaths of the created files. If the mode is 'multi-mask' the dictionary will contain additional keys for the side masks for each cell in the domain. :rtype: dict Example: .. code-block:: python filepaths = write_mask_solid( mask=np.array([[0, 1], [1, 1]]), grid="conus2", write_dir="/path/to/your/chosen/directory" ) .. py:function:: huc_to_ij(huc_list, grid) This function is deprecated. Use define_huc_domain() instead. .. py:function:: latlon_to_ij(latlon_bounds, grid) This function is deprecated. Use define_latlon_domain() instead. .. py:function:: create_mask_solid(huc_list, grid, write_dir) This function is deprecated. Use write_mask_solid() instead. .. py:function:: subset_static(ij_bounds, dataset, write_dir, var_list=('slope_x', 'slope_y', 'pf_indicator', 'mannings', 'pf_flowbarrier', 'pme', 'ss_pressure_head')) Subset static input files from national datasets in HydroData. The subset values will be written as ParFlow binary files (pfbs) in write_dir. By default the following variables will be subset. - Slope in the east/west direction (slope_x) - Slope in the north/south direction (slope_y) - Subsurface units indicator file (pf_indicator) - Mannings roughness coefficients (mannings) - Depth to bedrock (pf_flowbarrier) - Long term average precipitation minus evaporation (i.e. recharge) (pme) - Steady state pressure head used to initialize transient simulations (ss_pressure_head) Note that some datasets might not contain all 7 static input variables. In that case, the subset_static function is going to raise a ValueError for any variables that do not exist in the dataset. The default variable list contains the necessary static variables for the CONUS2 grid. For CONUS1-based datasets, "mannings" and "pf_flowbarrier" should be removed from the list. :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. :type ij_bounds: tuple[int] :param dataset: static inputs dataset name from the HydroData catalog e.g. "conus1_domain" :type dataset: str :param write_dir: directory where the subset files will be written :type write_dir: str :param var_list: tuple of variables to subset from the dataset. By default all 7 variables above will be subset. The user can specify a subset of these variables or list additional variables that are available in their dataset of choice. :type var_list: tuple[str] :returns: A dictionary mapping the static variable names to the corresponding file paths where the subset data were written. Example: .. code-block:: python # Subsetting static variables for a CONUS1 workflow # We need to remove "pf_flowbarrier" and "mannings" from the list filepaths = subset_static( ij_bounds=(375, 239, 487, 329), dataset="conus1_domain", write_dir="/path/to/your/chosen/directory", var_list=("slope_x", "slope_y", "pf_indicator", "pme", "ss_pressure_head") ) # Subsetting static variables for a CONUS2 workflow # Note that we can use the default var_list here filepaths = subset_static( ij_bounds=(3701, 1544, 3792, 1633), dataset="conus2_domain", write_dir="/path/to/your/chosen/directory", ) .. py:function:: subset_press_init(ij_bounds, dataset, date, write_dir, time_zone='UTC') Subset a pressure file from a national dataset in HydroData. This function will select the pressure file for midnight on the date provided and subset the selected pressure file to the ij_bounds provided. The subset data will be written out as a ParFlow binary file (pfb) to be used as an initial pressure file for a ParFlow simulation. :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. :type ij_bounds: tuple[int] :param dataset: dataset name from the HydroData catalog that the pressure file will be subset from e.g. "conus1_baseline_mod" :type dataset: str :param date: The date of the pressure file that you would like to subset, in the form 'yyyy-mm-dd' :type date: str :param write_dir: directory where the subset file will be written :type write_dir: str :param time_zone: timezone information for subset date. Data will be subset at midnight in the specified timezone. This should be a zoneinfo-supported time zone. Defaults to "UTC". :type time_zone: str :returns: The filepath of the subset file, which includes datetime information, so that it can be used by later functions (e.g. edit_runscript_for_subset). Example: .. code-block:: python filepath = subset_press_init( ij_bounds=(375, 239, 487, 329), dataset="conus1_baseline_mod", date="2005-12-15", write_dir="/path/to/your/chosen/directory", time_zone="EST" ) .. py:function:: subset_forcing(ij_bounds, grid, start, end, dataset, write_dir, time_zone='UTC', forcing_vars=('precipitation', 'downward_shortwave', 'downward_longwave', 'specific_humidity', 'air_temp', 'atmospheric_pressure', 'east_windspeed', 'north_windspeed'), dataset_version=None) Subset forcing files from national datasets in HydroData. Subset forcing data will be written out as pfb files formatted for a ParFlow run with 24 hours per forcing file. Per ParFlow-CLM convention separate files will be written for each variable following the standard clm variable naming convention. Forcing file outputs will be numbered starting with 0000 and data will start at midnight local time for the timezone that has been provided. If no timezone is provided it will default to midnight UTC. :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. :type ij_bounds: tuple[int] :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: "conus1" or "conus2" :type grid: str :param start: start date (inclusive), in the form 'yyyy-mm-dd' :type start: str :param end: end date (exlusive), in the form 'yyyy-mm-dd' :type end: str :param dataset: forcing dataset name from the HydroData catalog that the forcing files will be subset from e.g. "NLDAS2". :type dataset: str :param write_dir: directory where the subset files will be written :type write_dir: str :param time_zone: timezone information for start and end dates. Data will be subset starting at midnight in the specified timezone. This should be a zoneinfo-supported time zone. Defaults to "UTC". :type time_zone: str :param forcing_vars: tuple of forcing variables to subset. By default all 8 variables needed to run ParFlow-CLM will be subset. :type forcing_vars: tuple[str] :param dataset_version: version of the forcing dataset. By default the latest version of a dataset will be returned. :type dataset_version: str :returns: A dictionary mapping the forcing variable names to the corresponding file paths where the subset data were written. Example: .. code-block:: python filepaths = subset_forcing( ij_bounds=(1225, 1738, 1347, 1811), grid="conus2", start="2005-11-01", end="2005-12-01", dataset="CW3E", write_dir="/path/to/your/chosen/directory", forcing_vars=("precipitation", "air_temp"), dataset_version="0.9", ) .. py:function:: config_clm(ij_bounds, start, end, dataset, write_dir, time_zone='UTC') Modify template CLM driver files for a desired subdomain and run duration. This function will obtain template clm driver files (specifically vegm, vep and drv_clmin) from the existing national simulations on HydroData and modify them to reflect the desired subdomain (indicated by the ij_bounds) and run duration (indicated by the start and end dates). The modified files will be written out to a user specified directory. These files are required if you are going to run a ParFlow-CLM simulation. :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. :type ij_bounds: tuple[int] :param start: start date (inclusive), in the form 'yyyy-mm-dd' :type start: str :param end: end date (exlusive), in the form 'yyyy-mm-dd' :type end: str :param dataset: the dataset that the files should be obtained from name e.g. "conus1_baseline_mod" :type dataset: str :param write_dir: directory where the subset files will be written :type write_dir: str :param time_zone: timezone information for start and end dates. This should be a zoneinfo-supported time zone. Defaults to "UTC". :type time_zone: str :returns: A dictionary mapping the CLM file types ("vegp", "vegm", "drv_clm") to the corresponging filepaths where the CLM files were written. Example: .. code-block:: python filepaths = config_clm( ij_bounds=(375, 239, 487, 329), start="2005-10-01", end="2006-10-01", dataset="conus1_baseline_mod", write_dir="/path/to/your/chosen/directory" ) .. py:function:: vegm_to_land_cover(vegm_path, write_pfb_path=None) Convert a vegm.dat file in CLM format into a land cover array. This function assumes the vegm.dat file is in the standard format. That is, the file has 1 row per grid cell and each row contains 25 columns The columns are ordered as: x, y, lat, lon, sand, clay, color, then 18 columns representing the fractional coverage of the grid cell by vegetation class (these final 18 columns add to 1.0 for each row). The rows are in ascending order by grid cell index with y as the outer loop and x as the inner loop. In cases in which the fractional vegetation coverage results in a tie between multiple vegetation classes, the final land cover array will use the first (lowest) land cover designation to break the tie (ie. the land cover array will contain designation 1 for a grid cell in which the vegetation distribution is 0.5 class 1 and 0.5 class 5). :param vegm_path: path to vegm file :type vegm_path: str :param write_pfb_path: path to write output .pfb file to disk :type write_pfb_path: str; optional :returns: NumPy array containing the calculated land cover type for each domain grid cell. If `pfb_path` is provided, .pfb file is written to disk at the specified path. Example: .. code-block:: python land_cover_array = vegm_to_land_cover("/path/to/vegm/vegm.dat") .. py:function:: get_template_runscript(grid, mode, input_file_type, write_dir) Get a ParFlow template runscript. The runscript is selected based on the grid, mode and input file type and is copied to write_dir. :param grid: The spatial grid that the ij indices are calculated relative to and that the subset data will be returned on. Possible values: “conus1” or “conus2” :type grid: str :param mode: The type of simulation you would like to do. Possible values: "spinup" (run ParFlow with a constant recharge forcing at the upper boundary) and "transient" (coupled ParFlow-CLM run) :type mode: str :param input_file_type: The type of domain you will run. Possible values: "box" or "solid" :type input_file_type: str :param write_dir: directory where the template runscript file will be copied :type write_dir: str :returns: A path to the template runscript. Example: .. code-block:: python runscript_path = get_template_runscript( grid="conus1", mode="spinup", input_file_type="solid", write_dir="/path/to/your/chosen/directory" ) .. py:function:: edit_runscript_for_subset(ij_bounds, runscript_path, write_dir=None, runname=None, forcing_dir=None) Modify a ParFlow run script for a new subdomain run. This function is designed to start from a national ParFlow runscript template and perform the following three modifications. 1. Modify the geometry to reflect the bounds of the desired ij_bounds (i.e. the number of grid cells in the x and y direction and the upper bounds of the geometry) 2. Update the runname to for the desired new run. 3. Update the location of the climate forcings for the new run. If the runname is None and write_dir is the directory containing the runscript file, the runscript file will be overwritten. :param ij_bounds: bounding box for subset. This should be given as i,j index values where 0,0 is the lower left hand corner of a domain. ij_bounds are given relative to whatever grid is being used for the subset. :type ij_bounds: tuple[int] :param runscript_path: absolute path to the template parflow runscript file :type runscript_path: str :param write_dir: directory where the new template file will be written. If it is None, defaults to the directory containing the runscript. :type write_dir: str :param runname: name for the new parflow run. If it is None, defaults to the runscript's previous runname. :type runname: str :param forcing_dir: path to the directory containing the subset forcing files. If it is None, defaults to the runscript's previous forcing directory path. :type forcing_dir: str :returns: A path to the new runscript file that will be created. Example: .. code-block:: python runscript_path = edit_runscript_for_subset( ij_bounds=(375, 239, 487, 329), runscript_path="/path/to/your/original/runscript", runname="my_conus1_run", forcing_dir="/path/to/your/forcing/directory" ) .. py:function:: copy_files(read_dir, write_dir) Copy all files from read_dir to write_dir. :param read_dir: read-from directory path :type read_dir: str :param write_dir: write-to directory path :type write_dir: str Example: .. code-block:: python copy_files( read_dir="/path/to/read-from/directory", write_dir="/path/to/write-to/directory" ) .. py:function:: change_filename_values(runscript_path, write_dir=None, runname=None, slopex=None, slopey=None, solidfile=None, init_press=None, indicator=None, depth_to_bedrock=None, mannings=None, evap_trans=None) Change the filenames of input files in the ParFlow runscript. This function will update the paths to input files in a ParFlow runscript. The provided arguments will reset the corresponding parflow keys to match the user specified paths to input files. File names can be specified with or without relative or absolute file paths. If no path is provided ParFlow will expect the input files to be present in the run directory at the time of simulation. Note that this will only change paths for keys that already exist in the template ParFlow run script you are starting from and will not reconfigure a run to use new keys (for example if you are not starting from a run script that uses a solid file, adding a new solid file path will not configure the run to use a solid file). Refer to the ParFlow manual for additional information on any of the keys listed above. If the runname is None and write_dir is the directory containing the runscript file, the runscript file will be overwritten. :param runscript_path: path to the runscript file (yaml or pfidb) :type runscript_path: str :param write_dir: directory where the new template file will be written. If it is None, defaults to the directory containing the runscript file. :type write_dir: str :param runname: name of the new parflow run :type runname: str :param slopex: new slopex filename (and path) :type slopex: str :param slopey: new slopey filename (and path) :type slopey: str :param solidfile: new solidfile filename (and path) :type solidfile: str :param init_press: new initial pressure filename (and path) :type init_press: str :param indicator: new indicator input filename (and path) :type indicator: str :param depth_to_bedrock: new depth to bedrock filename (and path) :type depth_to_bedrock: str :param mannings: new mannings filename (and path) :type mannings: str :param evap_trans: new evapotranspiration filename (and path) :type evap_trans: str :returns: A path to the new runscript file that will be created. Example: .. code-block:: python runscript_path = change_filename_values( runscript_path="/path/to/your/original/runscript", runname="my_conus1_run", init_press="/filename/of/initial/pressure/pfb/file" ) .. py:function:: dist_run(topo_p, topo_q, runscript_path, working_dir=None, dist_clim_forcing=True) Distribute ParFlow input files for parallel computing. This function will distribute input files to topo_p grids in the x direction and topo_q grids in the y direction. If dist_clim_forcing is true, forcing files will be distributed as well according to the same topology. If working_dir is different that the directory containing the runscript file, the edited runscipt file will be written to working_dir. :param topo_p: number of grids (processes) to create in the x direction :type topo_p: int :param topo_q: number of grids (processes) to create in the y direction :type topo_q: int :param runscript_path: path to the runscript file (yaml or pfidb) :type runscript_path: str :param working_dir: directory containing the files to be distributed. If it is None, it defaults to the directory containing the runscript file. :type working_dir: str :param dist_clim_forcing: if true, distribute forcing files :type dist_clim_forcing: bool :returns: Path to the edited runscript file that will be created. :rtype: str Example: .. code-block:: python runscript_path = dist_run( topo_p=2, topo_q=2, runscript_path="/path/to/your/original/runscript", dist_clim_forcing=False )