Build a ParFlow run from a template

To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the “Launch Binder” button below. Note that Binder may take several minutes to launch.

The subsettools package comes with a collection of template ParFlow runscripts which you can use as a starting point in constructing a ParFlow model.

For a complete workflow subsetting and running ParFlow simulations refer to the example_notebooks section. Here we walk through more details on how to customize a run for your needs. Also note that if none of the template run scripts suite your needs there is a wide variety of test scripts and examples available through the ParFlow Repo and ParFlow short courses. For a complete list of ParFlow Resources checkout the HydroFrame website.

You will have the following choices in selecting a template run script:

Select the ParFlow configuration you would like to start from: We have templates for the ParFlow CONUS1 and CONUS2 domains. There are many differences between these domains which you can read about through the previous links. Generally speaking though the CONUS2 model is the most up to date domain and is an advancement from the CONUS1 domain. The CONUS2 model has a deeper vertical extent, more vertical layers, updated topographic processing and updated subsurface geology from the CONUS1 model
Determine whether you will run a box or solid file domain: A solid file can be used to define an irregular domain boundary (e.g. a watershed boundary). If you run with a solid file only the cells within the solid file will be designated as active. Alternatively you can create a box domain that encompasses your desired watershed boundary. There is no inherently better choice between these two. Solid file domains allow you to impose irregular boundaries and limit the active computational domain but box domains can be simpler to work with.
Determine what type of simulation you would like to do: There are two choices:
- Spinup: This template will run ParFlow with a constant recharge forcing at the upper boundary. This approach is used to initialize the groundwater configuration in a process called spinup (refer to the spinup workflow notebookfor a full example of this).
- Transient: This template if for a transient ParFlow-CLM run. If you choose this option you will need to subset forcing files for your simulation (i.e. Precipitation, temperature, etc.) and you will also need a clm driver file (refer to the transient workflow notebookfor a full example of this)

1. Setup

In all examples you will need to import the following packages and register your pin in order to have access to the HydroData datasets.

Refer to the getting started instructions for creating your pin if you have not done this already.

import subsettools as st
import hf_hydrodata as hf
from parflow import Run

hf.register_api_pin("your_email", "your_pin")

2. Get a template runscript

There are multiple ways to run ParFlow models. You can learn more about these approaches at the HydroFrame. Here will will be using a workflow based on .yaml files and python tools. You can read more about pftools and python functions for manipulating runscripts here.

As noted above we have provided template run scripts for the CONUS1 and CONUS2 model domains that you can use to start your simulation.

To get a template runscript provided with the package, you should use the get_template_runscript function (API reference here). The function will get the correct template runscript based on your choices, write it to your chosen directory, and return the path to the file written.

For example, to get the template for a coupled ParFlow-CLM run on the CONUS1 grid with a solid input file, you can do:

template_run = st.get_template_runscript(
    grid="conus1", 
    mode="transient", 
    input_file_type="solid",
    write_dir="/path/to/your/write/directory",
)

NOTE: At this point we have retrieved exactly the run script for the national simulations. We have not made any customizations to it and you would not expect it to run.

3. Explore the keys of the run script you obtained

You can inspect the yaml file that was returned to you manually the same way you would look at any other yaml file. The ParFlow manual provides a complete reference for all model keys here

In addition you create a ParFlow run object from a runscript file with the from_definition method of the ParFlow Run class and then explore any keys directly from this.

Here are a few examples (refer to the python run script documentation for more information):

run = Run.from_definition(template_run)

# For example we can look at the processor topology like this: 
print("Processor Topology:", run.Process.Topology.P, run.Process.Topology.Q, run.Process.Topology.R )

# and the domain configuraton like this
print("Dimensions nx, ny, nz:", run.ComputationalGrid.NX, run.ComputationalGrid.NY, run.ComputationalGrid.NZ)
print("Number of dz scalers:", run.dzScale.nzListNumber)
print(
    "dz scallers:",
    run.Cell._0.dzScale.Value,
    run.Cell._1.dzScale.Value,
    run.Cell._2.dzScale.Value,
    run.Cell._3.dzScale.Value,
    run.Cell._4.dzScale.Value,
)

# We can also print subsurface properties, like the hydraulic conductivity values: 
print("Indicator group names:", run.Geom.Perm.Names[1:])
# print the values for some of the indicator groups:
print("Hydraulic conductivity for the indicator group \'s1\':", run.Geom.s1.Perm.Value, "m/hour")
print("Hydraulic conductivity for the indicator group \'b1\':", run.Geom.b1.Perm.Value, "m/hour")

Processor Topology: 4 4 1
Dimensions nx, ny, nz: 3342 1888 5
Number of dz scalers: 5
dz scallers: 1.0 0.01 0.006 0.003 0.001
Indicator group names: ['s1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 'g1', 'g2', 'g3', 'g4', 'g5', 'g6', 'g7', 'g8', 'b1', 'b2']
Hydraulic conductivity for the indicator group 's1': 0.269022595 m/hour
Hydraulic conductivity for the indicator group 'b1': 0.005 m/hour

4. Modify a template runscript for a subset domain

The runscript template is configured for a national simulation. To run on a smaller domain you need to:

Adjust the size of the model to match your subset domain
Update the path to the model input files to match the clipped inputs for your subset domain (refer to the subset tutorials or sample workflow for examples of subsetting input files)

You can make any of these changes yourself manually by modifying the yaml file directly or following the example shown in Section 7 to modify the run object. Here we demonstrate two wrapper functions that will set multiple keys at once to help with this task.

4.1 Modify the size of the model domain to math your subset

First we adjust the size of the model run (nx, ny) to match the dimensions of our subset area which we obtain using the i,j bounds function and provide the path to the subset forcing files.

The edit_runscript_for_subset function (API reference here) will accomplish these two jobs.

NOTE: If you don’t provide a write_dir argument, the new runscript is going to be written in the directory or the original runscript. The filename depends on the runname, so if you also don’t change the runname the original runscript file will be overwritten.

#Set the ij bounds for the new model
ij_box_bounds, _ = st.define_latlon_domain(latlon_bounds=[[37.91, -91.43], [37.34, -90.63]], grid="conus1")
print(f"bounding box: {ij_box_bounds}") 

#Edit the runscript to match this subset
runscript_path = st.edit_runscript_for_subset(
    ij_bounds=ij_box_bounds,
    runscript_path=template_run,
    runname="my_new_conus1_run",
    forcing_dir="/path/to/your/write/directory"
)

#Print out the keys of the new run to confirm the changes
run = Run.from_definition(runscript_path)
print('-' * 100)
print("New runname:", run.get_name())
print("New grid nx, ny:", run.ComputationalGrid.NX, run.ComputationalGrid.NY)
print("Forcing directory:", run.Solver.CLM.MetFilePath)

bounding box: (2285, 436, 2359, 496)
New runname: my_new_conus1_run provided, a new yaml file will be created
Climate forcing directory has been changed to /home/ga6/test  in runscript.
ComputationalGrid.NY set to 60 and NX to 74
GeomInput.domaininput.InputType detected as SolidFile, no additional keys to change for subset
Updated runscript written to /home/ga6/test
----------------------------------------------------------------------------------------------------
New runname: my_new_conus1_run
New grid nx, ny: 74 60
Forcing directory: /home/ga6/test

4.2 Modify file paths for inputs in the runscript

Next we can use the subsettools change_filename_values function (API reference here) to modify the filenames for the input files of our runscript.

The filenames which can be updated from this function are:

Slope x (slopex)
Slopey y (slopey)
Soild File (solidfile)
Initial Pressure(init_press)
Subsurface Indicator (indicator)
Depth to bedrock (depth_to_bedrock)
Mannings Roughness (mannings)
Evapotranspiration top boundary forcing (evap_trans)

NOTE: Depending on your model configuration you may not have all of these input files and you may also have additional input files. This function just provides a way to reset some of the most common input files from the default if you choose. You can always set keys manually as noted above.

# change the file paths for the initial pressure, mannings and slope_x, slope_y data in the runscript:
runscript_path = st.change_filename_values(    
    runscript_path=runscript_path,
    init_press='init_press.pfb', 
    mannings='mannings.pfb',
    slopex='slope_x.pfb',
    slopey='slopy_y.pfb',
)

X Slopes filename changed to slope_x.pfb
Y Slopes filename changed to slopy_y.pfb
Initial pressure filename changed to init_press.pfb
Mannings filename changed to mannings.pfb
Updated runscript written to /home/ga6/test

5. Change the processor topology

ParFlow is designed to easily scale across multiple processors. To change the number of processors your model is running on you need to do two things:

Define your processor topology: We define the number of processors that the domain will be divided across and we also set the spatial configuration by specifying the number of processors in the P, Q, R directions which correspond to the x,y and z axes respectively. For example a P=4, Q=2, R=1 topology would divided the domain across 8 processors splitting the x axes by 4 and the y axes by 2 but not splitting in the z direction (NOTE: it its generally best practice not to split in the z direction)
Distribute input files to mach your processor topology

Here we will use the subsettools dist_run function (API reference here) to modify the processor topology keys and distribute our inputs to our chosen processor topology in one step. (Here too you could also accomplish this task by setting keys and distributing files in separate manual calls). The dist_run function will modify the runscript with the values of P and Q provided.

When you later launch the ParFlow simulation, you should make sure to request P * Q processors from your computing cluster. The dist_run function returns a path to the new runscript that will be created.

By default this function will distribute all .pfb files in working_dir. Because forcing files are so numerous it can take a long time to distribute them so this option can be turned on and off with the dist_clim_forcing flag. (Note that if you set the flag to True, you should have edited your runscript to include a valid directory to your forcing data - like we did above with the edit_runscript_for_subset function).

# distribute input files across 2 grids in the x direction and 2 grids in the y direction
runscript_path = st.dist_run(
    P=2,
    Q=2,
    runscript_path=runscript_path,
    working_dir='/path/to/the/files/to/be/distributed',
    dist_clim_forcing=True,
)

6. Change other keys in the runscript

The subset tools functions provide functionality for the most common tasks associated with modifying a template run script for a smaller domain. However, the user has full access to every key in the ParFlow yaml file and can adjust any key they would like.

To do this we follow these steps:

Read in a yaml file and create a run object for that runscript using the from_definition function (note this could be the template run script or one you have already modified for your subdomain it doesn’t matter).
Change any keys in the file manually
Write the modified run object back into the runscript file format using the write method of the ParFlow Run class. (The Run.write() function takes two arguments, the directory to write the runscript file to and the format (pfidb or yaml). It returns the path to the created runscript file.)

If you want to change many subsurface settings all at once using a table, you can follow this tutorial from the ParFlow documentation.

run = Run.from_definition(template_run)

# for example, we can lower the hydraulic conductivity of one of the bedrock layers:
print("Before: Hydraulic conductivity for the indicator group \'b1\' is", run.Geom.b1.Perm.Value, "m/hour")
run.Geom.b1.Perm.Value = 0.003
print("After: Hydraulic conductivity for the indicator group \'b1\' is", run.Geom.b1.Perm.Value, "m/hour")

# write the object back into a model runscript:
runscript_path, _ = run.write(working_directory='/path/to/your/write/directory', file_format='yaml')

Before: Hydraulic conductivity for the indicator group 'b1' is 0.005 m/hour
After: Hydraulic conductivity for the indicator group 'b1' is 0.003 m/hour

7. Run a ParFlow script

Once we have our runscript, we can go ahead a launch a ParFlow simulation! Note that we assume here that you already have ParFlow installed where you are running. Please refer to the HydroFrame for help getting started with ParFlow.

# create a run object
run = Run.from_definition('/path/to/your/customized/runscript/file')
# launch the run
run.run(working_dir='/path/to/your/working/directory')