subsettools
---------------------

.. py:module:: subsettools


Subpackages
~~~~~~~~~~~
.. toctree::
   :titlesonly:
   :maxdepth: 1
	      
   autoapi/subsettools/template_runscripts/index.rst


Submodules
~~~~~~~~~~
.. toctree::
   :titlesonly:
   :maxdepth: 1

   autoapi/subsettools/clm/index.rst
   autoapi/subsettools/domain/index.rst
   autoapi/subsettools/parflow_run/index.rst
   autoapi/subsettools/subsetting/index.rst


Functions
~~~~~~~~

.. autoapisummary::

   subsettools.define_huc_domain
   subsettools.define_latlon_domain
   subsettools.define_upstream_domain
   subsettools.write_mask_solid
   subsettools.huc_to_ij
   subsettools.latlon_to_ij
   subsettools.create_mask_solid
   subsettools.subset_static
   subsettools.subset_press_init
   subsettools.subset_forcing
   subsettools.config_clm
   subsettools.vegm_to_land_cover
   subsettools.get_template_runscript
   subsettools.edit_runscript_for_subset
   subsettools.copy_files
   subsettools.change_filename_values
   subsettools.dist_run


.. py:function:: define_huc_domain(hucs, grid)

   Define a domain by a collection of HUCs.

   The domain is defined by the grid ij bounds of a bounding box that
   encompasses the HUCs in the list and a mask for that bounding box indicating
   which cells in the bounding box are part of these HUCs.

   All HUC IDs in hucs must be the same length (HUCs of the same level).
   All HUCs should be adjacent. If a HUC is only partially covered by the
   provided grid, the grid bounds for the covered area will be returned.

   :param hucs: a list of USGS HUC IDs
   :type hucs: list[str]
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                “conus1” or “conus2”
   :type grid: str

   :returns: A tuple (bounds, mask).

             Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the
             bounds in the conus grid of the area defined by the HUC IDs in hucs.
             imin, jmin, imax, jmax are the west, south, east and north sides of the
             box respectively and all i,j indices are calculated relative to the
             lower southwest corner of the domain.

             Mask is a 2D numpy.ndarray that indicates which cells inside the bounding
             box are part of the selected HUC(s).

   :raises ValueError: If the area defined by the provided HUCs is not part of the
       given grid.

   Example:

   .. code-block:: python

       grid_bounds, mask = define_huc_domain(
           hucs=["14080201", "14080202", "14080203"], grid="conus1"
       )


.. py:function:: define_latlon_domain(latlon_bounds, grid)

   Define a domain by latitude/longitude bounds.

   The domain is defined by the grid ij bounds of a bounding box formed by the
   latitude/longitude bounds (latlon_bounds) relative to the selected conus grid
   and a mask for that bounding box indicating which cells are active CONUS
   points.

   :param latlon_bounds: list of the form [[lat1, lon1],
                         [lat2, lon2]]. [lat1, lon1] and [lat2, lon2] define the northwest
                         and southeast corners of the desired box respectively.
   :type latlon_bounds: List[List[float]]
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                “conus1” or “conus2”.
   :type grid: str

   :returns: A tuple (bounds, mask).

             Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the
             bounds in the conus grid of the area defined by the latlon_bounds. imin,
             jmin, imax, jmax are the west, south, east and north sides of the box
             respectively and all i,j indices are calculated relative to the lower
             southwest corner of the domain.

             Mask is a 2D numpy.ndarray that indicates which cells inside the bounding
             box are active CONUS points (for example, if ocean is part of the bounding
             box the corresponding cells will not be part of the mask).

   Example:

   .. code-block:: python

       grid_bounds, mask = define_latlon_domain(
           latlon_bounds=[[37.91, -91.43], [37.34, -90.63]], grid="conus2"
       )


.. py:function:: define_upstream_domain(outlets, grid)

   Define a domain that is the upstream area of the points in outlets.

   The domain is defined by the grid ij bounds of the bounding box that
   encompasses the upstream area of all the points in outlets and a mask for
   that bounding box indicating which cells are part of the selected area.

   The flow_direction files that are used to define the upstream area follow the
   convention: down: 1, left: 2, up: 3, right: 4.

   :param outlets: list of lat-lon points of the form
                   [[lat1, lon1], [lat2, lon2], ...]
   :type outlets: List[List[float]]
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                “conus1” or “conus2”
   :type grid: str

   :returns: A tuple (bounds, mask).

             Bounds is a tuple of the form (imin, jmin, imax, jmax) representing the
             bounds in the conus grid of the upstream area of the outlets. imin, jmin,
             imax, jmax are the west, south, east and north sides of the box
             respectively and all i,j indices are calculated relative to the lower
             southwest corner of the domain.

             Mask is a 2D numpy.ndarray that indicates which cells inside the bounding
             box are part of the computed upstream area of the outlets.

   :raises ValueError: If the computed upstream area of the outlets is empty.

   Example:

   .. code-block:: python

       bounds, mask = define_upstream_domain(
           outlets=[[44.1348, -95.5084], [44.1352, -95.4949]],
           grid="conus2"
       )


.. py:function:: write_mask_solid(mask, grid, write_dir, mode='single-mask', ij_bounds=None)

   Create ParFlow mask and solid files from a mask array.

   Given an integer mask array consisting of 0s and 1s, this function will
   create three files in write_dir.
       - a 2D mask file that indicates which cells inside the box domain are
         part of the selected HUCS.
       - a solid file that defines a 3D domain extending to the depth of
         whichever grid has been selected and tracing the boundaries of the
         selected HUCS.
       - a vtk file, which can be used to visualize the solid file in ParaView.
   If the mode is 'multi-mask', another six masks will be written for the top,
   bottom, left, right, front and back masks for each cell in the domain.

   :param mask: an integer array such that mask[i, j] == 1 if the
                cell (i, j) is part of the domain, and mask[i, j] == 0 otherwise.
   :type mask: numpy.ndarray
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                “conus1” or “conus2”
   :type grid: str
   :param write_dir: directory path where the mask and solid files will be
                     written
   :type write_dir: str
   :param mode: This is the mode that the pfmask-to-pfsol script will be run.
                It can be either 'single-mask' or 'multi-mask'. Currently, 'multi-mask'
                mode is only supported for the CONUS2 grid.
   :type mode: str
   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset. This is only necessary if mode is 'multi-mask'.
   :type ij_bounds: tuple[int]

   :returns:

             A dictionary mapping the keys ("mask", "mask_vtk", "solid") to the
                 corresponding filepaths of the created files. If the mode is 'multi-mask'
                 the dictionary will contain additional keys for the side masks for each
                 cell in the domain.
   :rtype: dict

   Example:

   .. code-block:: python

       filepaths = write_mask_solid(
           mask=np.array([[0, 1], [1, 1]]),
           grid="conus2",
           write_dir="/path/to/your/chosen/directory"
       )


.. py:function:: huc_to_ij(huc_list, grid)

   This function is deprecated.

   Use define_huc_domain() instead.


.. py:function:: latlon_to_ij(latlon_bounds, grid)

   This function is deprecated.

   Use define_latlon_domain() instead.


.. py:function:: create_mask_solid(huc_list, grid, write_dir)

   This function is deprecated.

   Use write_mask_solid() instead.


.. py:function:: subset_static(ij_bounds, dataset, write_dir, var_list=('slope_x', 'slope_y', 'pf_indicator', 'mannings', 'pf_flowbarrier', 'pme', 'ss_pressure_head'))

   Subset static input files from national datasets in HydroData.

   The subset values will be written as ParFlow binary files (pfbs) in
   write_dir. By default the following variables will be subset.
       - Slope in the east/west direction (slope_x)
       - Slope in the north/south direction (slope_y)
       - Subsurface units indicator file (pf_indicator)
       - Mannings roughness coefficients (mannings)
       - Depth to bedrock (pf_flowbarrier)
       - Long term average precipitation minus evaporation (i.e. recharge) (pme)
       - Steady state pressure head used to initialize transient simulations
         (ss_pressure_head)

   Note that some datasets might not contain all 7 static input variables. In
   that case, the subset_static function is going to raise a ValueError for any
   variables that do not exist in the dataset. The default variable list
   contains the necessary static variables for the CONUS2 grid. For CONUS1-based
   datasets, "mannings" and "pf_flowbarrier" should be removed from the list.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param dataset: static inputs dataset name from the HydroData catalog e.g.
                   "conus1_domain"
   :type dataset: str
   :param write_dir: directory where the subset files will be written
   :type write_dir: str
   :param var_list: tuple of variables to subset from the dataset.
                    By default all 7 variables above will be subset. The user can specify
                    a subset of these variables or list additional variables that are
                    available in their dataset of choice.
   :type var_list: tuple[str]

   :returns: A dictionary mapping the static variable names to the corresponding file
             paths where the subset data were written.

   Example:

   .. code-block:: python

       # Subsetting static variables for a CONUS1 workflow
       # We need to remove "pf_flowbarrier" and "mannings" from the list
       filepaths = subset_static(
           ij_bounds=(375, 239, 487, 329),
           dataset="conus1_domain",
           write_dir="/path/to/your/chosen/directory",
           var_list=("slope_x", "slope_y", "pf_indicator", "pme",
                     "ss_pressure_head")
       )

       # Subsetting static variables for a CONUS2 workflow
       # Note that we can use the default var_list here
       filepaths = subset_static(
           ij_bounds=(3701, 1544, 3792, 1633),
           dataset="conus2_domain",
           write_dir="/path/to/your/chosen/directory",
       )


.. py:function:: subset_press_init(ij_bounds, dataset, date, write_dir, time_zone='UTC')

   Subset a pressure file from a national dataset in HydroData.

   This function will select the pressure file for midnight on the date provided
   and subset the selected pressure file to the ij_bounds provided. The subset
   data will be written out as a ParFlow binary file (pfb) to be used as an
   initial pressure file for a ParFlow simulation.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param dataset: dataset name from the HydroData catalog that the pressure
                   file will be subset from e.g. "conus1_baseline_mod"
   :type dataset: str
   :param date: The date of the pressure file that you would like to subset,
                in the form 'yyyy-mm-dd'
   :type date: str
   :param write_dir: directory where the subset file will be written
   :type write_dir: str
   :param time_zone: timezone information for subset date. Data will be
                     subset at midnight in the specified timezone. This should be a
                     zoneinfo-supported time zone. Defaults to "UTC".
   :type time_zone: str

   :returns: The filepath of the subset file, which includes datetime information, so
             that it can be used by later functions (e.g. edit_runscript_for_subset).

   Example:

   .. code-block:: python

       filepath = subset_press_init(
           ij_bounds=(375, 239, 487, 329),
           dataset="conus1_baseline_mod",
           date="2005-12-15",
           write_dir="/path/to/your/chosen/directory",
           time_zone="EST"
       )


.. py:function:: subset_forcing(ij_bounds, grid, start, end, dataset, write_dir, time_zone='UTC', forcing_vars=('precipitation', 'downward_shortwave', 'downward_longwave', 'specific_humidity', 'air_temp', 'atmospheric_pressure', 'east_windspeed', 'north_windspeed'), dataset_version=None)

   Subset forcing files from national datasets in HydroData.

   Subset forcing data will be written out as pfb files formatted for a ParFlow
   run with 24 hours per forcing file. Per ParFlow-CLM convention separate files
   will be written for each variable following the standard clm variable naming
   convention.

   Forcing file outputs will be numbered starting with 0000 and data will start
   at midnight local time for the timezone that has been provided. If no
   timezone is provided it will default to midnight UTC.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                "conus1" or "conus2"
   :type grid: str
   :param start: start date (inclusive), in the form 'yyyy-mm-dd'
   :type start: str
   :param end: end date (exlusive), in the form 'yyyy-mm-dd'
   :type end: str
   :param dataset: forcing dataset name from the HydroData catalog that the
                   forcing files will be subset from e.g. "NLDAS2".
   :type dataset: str
   :param write_dir: directory where the subset files will be written
   :type write_dir: str
   :param time_zone: timezone information for start and end dates. Data will
                     be subset starting at midnight in the specified timezone. This
                     should be a zoneinfo-supported time zone. Defaults to "UTC".
   :type time_zone: str
   :param forcing_vars: tuple of forcing variables to subset. By
                        default all 8 variables needed to run ParFlow-CLM will be subset.
   :type forcing_vars: tuple[str]
   :param dataset_version: version of the forcing dataset. By default the
                           latest version of a dataset will be returned.
   :type dataset_version: str

   :returns: A dictionary mapping the forcing variable names to the corresponding file
             paths where the subset data were written.

   Example:

   .. code-block:: python

       filepaths = subset_forcing(
           ij_bounds=(1225, 1738, 1347, 1811),
           grid="conus2",
           start="2005-11-01",
           end="2005-12-01",
           dataset="CW3E",
           write_dir="/path/to/your/chosen/directory",
           forcing_vars=("precipitation", "air_temp"),
           dataset_version="0.9",
       )


.. py:function:: config_clm(ij_bounds, start, end, dataset, write_dir, time_zone='UTC')

   Modify template CLM driver files for a desired subdomain and run duration.

   This function will obtain template clm driver files (specifically vegm, vep
   and drv_clmin) from the existing national simulations on HydroData and modify
   them to reflect the desired subdomain (indicated by the ij_bounds) and run
   duration (indicated by the start and end dates). The modified files will be
   written out to a user specified directory. These files are required if you
   are going to run a ParFlow-CLM simulation.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param start: start date (inclusive), in the form 'yyyy-mm-dd'
   :type start: str
   :param end: end date (exlusive), in the form 'yyyy-mm-dd'
   :type end: str
   :param dataset: the dataset that the files should be obtained from name
                   e.g. "conus1_baseline_mod"
   :type dataset: str
   :param write_dir: directory where the subset files will be written
   :type write_dir: str
   :param time_zone: timezone information for start and end dates. This
                     should be a zoneinfo-supported time zone. Defaults to "UTC".
   :type time_zone: str

   :returns: A dictionary mapping the CLM file types ("vegp", "vegm", "drv_clm") to
             the corresponging filepaths where the CLM files were written.

   Example:

   .. code-block:: python

       filepaths = config_clm(
           ij_bounds=(375, 239, 487, 329),
           start="2005-10-01",
           end="2006-10-01",
           dataset="conus1_baseline_mod",
           write_dir="/path/to/your/chosen/directory"
       )


.. py:function:: vegm_to_land_cover(vegm_path, write_pfb_path=None)

   Convert a vegm.dat file in CLM format into a land cover array.

   This function assumes the vegm.dat file is in the standard format.
   That is, the file has 1 row per grid cell and each row contains 25 columns
   The columns are ordered as: x, y, lat, lon, sand, clay, color, then 18
   columns representing the fractional coverage of the grid cell by vegetation
   class (these final 18 columns add to 1.0 for each row). The rows are in
   ascending order by grid cell index with y as the outer loop and x as the
   inner loop.

   In cases in which the fractional vegetation coverage results in a tie
   between multiple vegetation classes, the final land cover array will
   use the first (lowest) land cover designation to break the tie
   (ie. the land cover array will contain designation 1 for a grid cell
   in which the vegetation distribution is 0.5 class 1 and 0.5 class 5).

   :param vegm_path: path to vegm file
   :type vegm_path: str
   :param write_pfb_path: path to write output .pfb file to disk
   :type write_pfb_path: str; optional

   :returns: NumPy array containing the calculated land cover type for
             each domain grid cell.

             If `pfb_path` is provided, .pfb file is written to disk at the
             specified path.

   Example:

   .. code-block:: python

       land_cover_array = vegm_to_land_cover("/path/to/vegm/vegm.dat")


.. py:function:: get_template_runscript(grid, mode, input_file_type, write_dir)

   Get a ParFlow template runscript.

   The runscript is selected based on the grid, mode and input file type and
   is copied to write_dir.

   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                “conus1” or “conus2”
   :type grid: str
   :param mode: The type of simulation you would like to do. Possible values:
                "spinup" (run ParFlow with a constant recharge forcing at the upper
                boundary) and "transient" (coupled ParFlow-CLM run)
   :type mode: str
   :param input_file_type: The type of domain you will run. Possible values:
                           "box" or "solid"
   :type input_file_type: str
   :param write_dir: directory where the template runscript file will be
                     copied
   :type write_dir: str

   :returns: A path to the template runscript.

   Example:

   .. code-block:: python

       runscript_path = get_template_runscript(
           grid="conus1",
           mode="spinup",
           input_file_type="solid",
           write_dir="/path/to/your/chosen/directory"
       )


.. py:function:: edit_runscript_for_subset(ij_bounds, runscript_path, write_dir=None, runname=None, forcing_dir=None)

   Modify a ParFlow run script for a new subdomain run.

   This function is designed to start from a national ParFlow runscript template
   and perform the following three modifications.
       1. Modify the geometry to reflect the bounds of the desired ij_bounds
          (i.e. the number of grid cells in the x and y direction and the upper
          bounds of the geometry)
       2. Update the runname to for the desired new run.
       3. Update the location of the climate forcings for the new run.

   If the runname is None and write_dir is the directory containing the
   runscript file, the runscript file will be overwritten.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param runscript_path: absolute path to the template parflow runscript
                          file
   :type runscript_path: str
   :param write_dir: directory where the new template file will be written.
                     If it is None, defaults to the directory containing the runscript.
   :type write_dir: str
   :param runname: name for the new parflow run. If it is None, defaults to
                   the runscript's previous runname.
   :type runname: str
   :param forcing_dir: path to the directory containing the subset forcing
                       files. If it is None, defaults to the runscript's previous forcing
                       directory path.
   :type forcing_dir: str

   :returns: A path to the new runscript file that will be created.

   Example:

   .. code-block:: python

       runscript_path = edit_runscript_for_subset(
           ij_bounds=(375, 239, 487, 329),
           runscript_path="/path/to/your/original/runscript",
           runname="my_conus1_run",
           forcing_dir="/path/to/your/forcing/directory"
       )


.. py:function:: copy_files(read_dir, write_dir)

   Copy all files from read_dir to write_dir.

   :param read_dir: read-from directory path
   :type read_dir: str
   :param write_dir: write-to directory path
   :type write_dir: str

   Example:

   .. code-block:: python

       copy_files(
           read_dir="/path/to/read-from/directory",
           write_dir="/path/to/write-to/directory"
       )


.. py:function:: change_filename_values(runscript_path, write_dir=None, runname=None, slopex=None, slopey=None, solidfile=None, init_press=None, indicator=None, depth_to_bedrock=None, mannings=None, evap_trans=None)

   Change the filenames of input files in the ParFlow runscript.

   This function will update the paths to input files in a ParFlow runscript.
   The provided arguments will reset the corresponding parflow keys to match the
   user specified paths to input files.  File names can be specified with or
   without relative or absolute file paths. If no path is provided ParFlow will
   expect the input files to be present in the run directory at the time of
   simulation.

   Note that this will only change paths for keys that already exist in the
   template ParFlow run script you are starting from and will not reconfigure a
   run to use new keys (for example if you are not starting from a run script
   that uses a solid file, adding a new solid file path will not configure the
   run to use a solid file).

   Refer to the ParFlow manual for additional information on any of the keys
   listed above.


   If the runname is None and write_dir is the directory containing the
   runscript file, the runscript file will be overwritten.

   :param runscript_path: path to the runscript file (yaml or pfidb)
   :type runscript_path: str
   :param write_dir: directory where the new template file will be written.
                     If it is None, defaults to the directory containing the runscript
                     file.
   :type write_dir: str
   :param runname: name of the new parflow run
   :type runname: str
   :param slopex: new slopex filename (and path)
   :type slopex: str
   :param slopey: new slopey filename (and path)
   :type slopey: str
   :param solidfile: new solidfile filename (and path)
   :type solidfile: str
   :param init_press: new initial pressure filename (and path)
   :type init_press: str
   :param indicator: new indicator input filename (and path)
   :type indicator: str
   :param depth_to_bedrock: new depth to bedrock filename (and path)
   :type depth_to_bedrock: str
   :param mannings: new mannings filename (and path)
   :type mannings: str
   :param evap_trans: new evapotranspiration filename (and path)
   :type evap_trans: str

   :returns: A path to the new runscript file that will be created.

   Example:

   .. code-block:: python

       runscript_path = change_filename_values(
           runscript_path="/path/to/your/original/runscript",
           runname="my_conus1_run",
           init_press="/filename/of/initial/pressure/pfb/file"
       )


.. py:function:: dist_run(topo_p, topo_q, runscript_path, working_dir=None, dist_clim_forcing=True)

   Distribute ParFlow input files for parallel computing.

   This function will distribute input files to topo_p grids in the
   x direction and topo_q grids in the y direction. If dist_clim_forcing
   is true, forcing files will be distributed as well according to the
   same topology. If working_dir is different that the directory containing
   the runscript file, the edited runscipt file will be written to working_dir.

   :param topo_p: number of grids (processes) to create in the x direction
   :type topo_p: int
   :param topo_q: number of grids (processes) to create in the y direction
   :type topo_q: int
   :param runscript_path: path to the runscript file (yaml or pfidb)
   :type runscript_path: str
   :param working_dir: directory containing the files to be distributed.
                       If it is None, it defaults to the directory containing the runscript
                       file.
   :type working_dir: str
   :param dist_clim_forcing: if true, distribute forcing files
   :type dist_clim_forcing: bool

   :returns: Path to the edited runscript file that will be created.
   :rtype: str

   Example:

   .. code-block:: python

       runscript_path = dist_run(
           topo_p=2,
           topo_q=2,
           runscript_path="/path/to/your/original/runscript",
           dist_clim_forcing=False
       )