subsettools.subsetting
--------------------------------

.. py:module:: subsettools.subsetting

.. autoapi-nested-parse::

   Functions to subset gridded files from national datasets in HydroData.

   The following functions can be used to subset gridded input files to set up a
   ParFlow simulation.
       - subset static model inputs
       - subset meteorological forcings
       - subset initial pressure data
       - subset gridded CLM inputs (vegm)


Functions
~~~~~~~~

.. autoapisummary::

   subsettools.subsetting.subset_static
   subsettools.subsetting.subset_press_init
   subsettools.subsetting.subset_forcing


.. py:function:: subset_static(ij_bounds, dataset, write_dir, var_list=('slope_x', 'slope_y', 'pf_indicator', 'mannings', 'pf_flowbarrier', 'pme', 'ss_pressure_head'))

   Subset static input files from national datasets in HydroData.

   The subset values will be written as ParFlow binary files (pfbs) in
   write_dir. By default the following variables will be subset.
       - Slope in the east/west direction (slope_x)
       - Slope in the north/south direction (slope_y)
       - Subsurface units indicator file (pf_indicator)
       - Mannings roughness coefficients (mannings)
       - Depth to bedrock (pf_flowbarrier)
       - Long term average precipitation minus evaporation (i.e. recharge) (pme)
       - Steady state pressure head used to initialize transient simulations
         (ss_pressure_head)

   Note that some datasets might not contain all 7 static input variables. In
   that case, the subset_static function is going to raise a ValueError for any
   variables that do not exist in the dataset. The default variable list
   contains the necessary static variables for the CONUS2 grid. For CONUS1-based
   datasets, "mannings" and "pf_flowbarrier" should be removed from the list.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param dataset: static inputs dataset name from the HydroData catalog e.g.
                   "conus1_domain"
   :type dataset: str
   :param write_dir: directory where the subset files will be written
   :type write_dir: str
   :param var_list: tuple of variables to subset from the dataset.
                    By default all 7 variables above will be subset. The user can specify
                    a subset of these variables or list additional variables that are
                    available in their dataset of choice.
   :type var_list: tuple[str]

   :returns: A dictionary mapping the static variable names to the corresponding file
             paths where the subset data were written.

   Example:

   .. code-block:: python

       # Subsetting static variables for a CONUS1 workflow
       # We need to remove "pf_flowbarrier" and "mannings" from the list
       filepaths = subset_static(
           ij_bounds=(375, 239, 487, 329),
           dataset="conus1_domain",
           write_dir="/path/to/your/chosen/directory",
           var_list=("slope_x", "slope_y", "pf_indicator", "pme",
                     "ss_pressure_head")
       )

       # Subsetting static variables for a CONUS2 workflow
       # Note that we can use the default var_list here
       filepaths = subset_static(
           ij_bounds=(3701, 1544, 3792, 1633),
           dataset="conus2_domain",
           write_dir="/path/to/your/chosen/directory",
       )


.. py:function:: subset_press_init(ij_bounds, dataset, date, write_dir, time_zone='UTC')

   Subset a pressure file from a national dataset in HydroData.

   This function will select the pressure file for midnight on the date provided
   and subset the selected pressure file to the ij_bounds provided. The subset
   data will be written out as a ParFlow binary file (pfb) to be used as an
   initial pressure file for a ParFlow simulation.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param dataset: dataset name from the HydroData catalog that the pressure
                   file will be subset from e.g. "conus1_baseline_mod"
   :type dataset: str
   :param date: The date of the pressure file that you would like to subset,
                in the form 'yyyy-mm-dd'
   :type date: str
   :param write_dir: directory where the subset file will be written
   :type write_dir: str
   :param time_zone: timezone information for subset date. Data will be
                     subset at midnight in the specified timezone. Defaults to "UTC".
   :type time_zone: str

   :returns: The filepath of the subset file, which includes datetime information, so
             that it can be used by later functions (e.g. edit_runscript_for_subset).

   Example:

   .. code-block:: python

       filepath = subset_press_init(
           ij_bounds=(375, 239, 487, 329),
           dataset="conus1_baseline_mod",
           date="2005-12-15",
           write_dir="/path/to/your/chosen/directory",
           time_zone="EST"
       )


.. py:function:: subset_forcing(ij_bounds, grid, start, end, dataset, write_dir, time_zone='UTC', forcing_vars=('precipitation', 'downward_shortwave', 'downward_longwave', 'specific_humidity', 'air_temp', 'atmospheric_pressure', 'east_windspeed', 'north_windspeed'))

   Subset forcing files from national datasets in HydroData.

   Subset forcing data will be written out as pfb files formatted for a ParFlow
   run with 24 hours per forcing file. Per ParFlow-CLM convention separate files
   will be written for each variable following the standard clm variable naming
   convention.

   Forcing file outputs will be numbered starting with 0000 and data will start
   at midnight local time for the timezone that has been provided. If no
   timezone is provided it will default to midnight UTC.

   :param ij_bounds: bounding box for subset. This should be given as
                     i,j index values where 0,0 is the lower left hand corner of a domain.
                     ij_bounds are given relative to whatever grid is being used for the
                     subset.
   :type ij_bounds: tuple[int]
   :param grid: The spatial grid that the ij indices are calculated relative
                to and that the subset data will be returned on. Possible values:
                "conus1" or "conus2"
   :type grid: str
   :param start: start date (inclusive), in the form 'yyyy-mm-dd'
   :type start: str
   :param end: end date (exlusive), in the form 'yyyy-mm-dd'
   :type end: str
   :param dataset: forcing dataset name from the HydroData catalog that the
                   forcing files will be subset from e.g. "NLDAS2".
   :type dataset: str
   :param write_dir: directory where the subset files will be written
   :type write_dir: str
   :param time_zone: timezone information for start and end dates. Data will
                     be subset starting at midnight in the specified timezone. Defaults to
                     "UTC".
   :type time_zone: str
   :param forcing_vars: tuple of forcing variables to subset. By
                        default all 8 variables needed to run ParFlow-CLM will be subset.
   :type forcing_vars: tuple[str]

   :returns: A dictionary mapping the forcing variable names to the corresponding file
             paths where the subset data were written.

   Example:

   .. code-block:: python

       filepaths = subset_forcing(
           ij_bounds=(1225, 1738, 1347, 1811),
           grid="conus2",
           start="2005-11-01",
           end="2005-12-01",
           dataset="CW3E",
           write_dir="/path/to/your/chosen/directory",
           forcing_vars=("precipitation", "air_temp")
       )