Many data processing steps are usually required to process source data to model input. For example, converting a shapefile to a grid, correcting mistakes in the source data, or prescribing parameters to the model based on the source data. These steps can be performed in graphical user interfaces (GUI) or scripts. Actions in a GUI are nearly always unrecorded, meaning mistakes are usually hard to identify and rectifying mistakes is costly, as it entails manually performing all steps again. In contrast, a script is an explicit testimony of all actions and rectifying mistakes involves just fixing the specific line of code and re-running.


Scripting is therefore essential in creating reproducible workflows.  Python is a common applied scripting language within Deltares (next to Matlab).

In this chapter we will give a quick introduction to Python and point to some more elaborate guides.

How to start with Python

See the following links to start with Python:

Programming environment

Creating a suitable programming environment for your modeling projects is essential to ensure reproducibility. A well-defined environment helps in reproducing the project at a later stage and allows for easy sharing of required packages or modules with colleagues. In this section, we will discuss the importance of environments and provide guidance on setting up an environment using Anaconda or the Venv package.


Python Environment

Working in a Python environment is preferred to ensure reproducibility and facilitate sharing. With Anaconda, you can easily create and manage Python environments. The Conda documentation provides detailed instructions on how to set up and manage environments using Anaconda.

Model project usually require an installation procedure to get them to work on a machine. These can range from fairly simple (“unzip and copy the project workflow in this specifically named folder on your machine”), to more complicated with various steps (“Install Anaconda version 3.2, then install the following packages: ..., unzip the project workflow, and keep your fingers crossed that this all works on your machine”).  A more complicated procedure is more prone to error, which hampers transferability. There are ways to reduce the amount of effort required to install a project workflow, such as creating installers that install a scripting language with all the required packages, which is the goal of Deltaforge. However, when the user wants to run the project workflow, a minimal setup procedure is usually unavoidable.

Alternatively, utilizing pre-configured and distributable virtual machines can be an option. Currently, our experience primarily revolves around Docker Images, which can be challenging to set up on Windows due to the extensive installation process, including the requirement of Windows Subsystem for Linux (WSL). Additionally, Docker Images may not provide easy interactive working, hindering workflow development. To ensure the reproducibility and ease of setup across multiple platforms, it is recommended to include a readme file specifying the versions of the model code and all required scripting packages. A step further would be to provide a comprehensive list of packages and libraries, along with their version numbers, in a generic format like YAML. Ideally, an environment can be shared through an image, such as Docker, or an installer like Deltaforge, allowing for streamlined setup and execution.

Collection of commonly used scripts in packages

Currently, multiple packages for model generation at Deltares are developed in Python. These packages serve another essential link in the move towards reproducible workflows, as they allow easy generation of very specific model files, boosting productivity and making scripts less error prone with well tested functions and objects. On this page we try to give an example  of existing tools used within Deltares.

HydroMT

HydroMT is an open-source package that allows easy model building from raw data to a complete water system model for multiple Deltares model codes, namely WFLOW, SFINCS, Delwaq and FIAT. In addition, plugins for Urban Water Balance, RIBASIM, Delft3D FM 1D2D and the first non-Deltares software are under development. HydroMT aims at making the model generation process fast, modular, and reproducible independent of its location or users. This is done by a configuration file representing the model characteristics and building process, in combination with a data catalogue in which global and local datasets are accessible. The models are either built with the command line application, scripting or dashboards (new development) and can be integrated within FEWS. More information and examples are found in its documentation: https://deltares.github.io/hydromt/latest/

iMOD Python

iMOD Python is a python library that supports model generation, as well as extra utilities to allow fast regridding in 3D and spatial operations. Furthermore, it has several plotting utilities (2D and 3D) and allows preparing data for the iMOD Qgis plugin. Model codes supported at present are iMODFLOW, iMOD-WQ, MetaSWAP, and Modflow 6. More information and examples are found in its documentation: https://deltares.gitlab.io/imod/imod-python/

HYDROLIB

HYDROLIB is an open-source community of developers, modelers, and users from waterboards, consultancy firms and research institutes. Within this community a collaborative Python package, called HYDROLIB, is developed with tools for preprocessing, postprocessing and analysis of hydrodynamical data and simulations results dedicated toward the automation of workflow for hydrological and hydrodynamical modelling. Currently it is focused upon (but not restricted to) the Delft3D FM software for the hydrodynamics. HYDROLIB builds upon the basic Delft3D FM I/O functionality provided by the HYDROLIB-core package.


Workflow managers

A collection of scripts by itself is not necessarily a reproducible workflow. Scripts often must be run in a certain order, as data is exchanged amongst them. This makes re-running them manually (one-by-one) error prone, whereas running all scripts in the correct order with one “master” script is often overkill, as usually only parts of the workflow must be re-run in a specific order. The purpose of workflow managers is to keep track of data which has been changed, and then re-run only the steps of the workflow which need to be (re-)run. At the departments of GWB and HYD, Snakemake is used as workflow manager to create reproducible and scalable workflows. It can run any program with a command line interface, and additionally has extra support for the scripting languages Python and R. The program requires explicitly specifying data input and output for each processing step in a “Snakefile”. Based on the input and output of each step, a directed acyclic graph of data dependencies is determined, from which the order of steps execution follows.

Training material for Snakemake is available. On Thursday the 22nd of September 2022, colleagues of the HYD and GWB department gave a pizza course in its use. Furthermore, colleagues in the GWB department wrote practical tips in the use of Snakemake, which can be found here: https://deltares.github.io/iMOD-Documentation/practical_snakemake.html




  • No labels

1 Comment

  1. Can these workflow managers cooperate with Slurm?