LHM fresh-salt

The LHM fresh-salt is the Dutch national groundwater model, which simulates groundwater flow and salinity of the complete Netherlands over the last 100 years. It is a large groundwater model, with a grid of 40 layers, 1300 rows, 1200 columns, of which half the cells is active, totaling to 31 million active cells. Scripts for the model are in version control (Git) and pushed to Gitlab. Snakemake is used as workflow manager, from external data all the way to model output and plots. Data is also version controlled (DVC), however no platform was found yet to host the large data files. The large datasets also hamper distributing the project workflow, as most laptops cannot handle these file sizes. Therefore, the machine running the workflow requires a lot of memory and connection to a data storage with enough capacity. At present, the model workflow runs on the LHM server, but has been applied using Amazon Web Services as well.

https://gitlab.com/deltares/imod/nhi-fresh-salt

LHM fresh

The LHM fresh is a model which has been developed over 10 years. It therefore contains a lot of legacy code. Every so often, ad-hoc fixes were introduced when the project deadline neared, which could persist for sometimes years and then could later resulted in further work-arounds as the workflow expanded. This resulted in a convoluted workflow, including some manual steps, where mistakes could be introduced by one colleague without being tracked. This resulted in meetings with several colleagues trying to find out what happened.

Attempts were done to circumvent this in the past, as a dedicated tool was developed to do version control on data, but colleagues stopped using this tool for unknown reasons.


Late 2021, the team started to improve the reproducibility of the LHM fresh, by putting its scripts and data in version control (Git + DVC). See here: https://gitlab.com/deltares/imod/lhm.  Furthermore parts of the workflow are included in Snakemake, such as the derivation of the topsystem boundary conditions (https://gitlab.com/deltares/imod/lhm-topsysteem ). It is the plan to move more parts of the workflow to Snakemake.


The LHM fresh serves as a good example of why reproducibility matters, especially for long-term projects, which need to be maintained and further developed for many years.

San Francisco Bay model

An example of a relatively small regional model, which can be distributed and reproduced on your local machine, is the fresh-salt groundwater model of the San Francisco Bay area. Though being a relatively simple example, quite some processing steps were required to move from external data to model input (Figure 1), showing the usefulness of a workflow manager, in this case Snakemake, even for smaller studies. Git and DVC are used as version control managers. The smaller external data (~5GB) allowed distribution via DVC, as it could be stored on a cloud storage, in this case a personal Google Drive (limited to 10 GB). This also allowed testing the practical implications of hosting data on the N: drive (requiring a VPN connection) and Google Drive. It was found that 1) Google had faster download speeds as our VPN, and 2) Google has more stable download protocols, as small hiccups in the connection seemed to less affect download success. The project was used as an example in an iMOD Python workshop for Blue Earth in 2020.

https://gitlab.com/deltares/imod/california_model

Bangkok

In the Bangkok project a Flood Early Warning System (FEWS) of Bangkok city is developed together with PANYA consultants for the client Bangkok Metropolitan Administration. In this system both data and hydraulic models that compute the floods are shown. In order to develop the Delft-FEWS and 38 hydraulic models of Bangkok some challenges had to be solved: complexity of the water system, the size of the model area, working on distance with our partner PANYA, data availability, the speed of the computation, how to cope with model verification and how to handle the growing database over time. These challenges led to several choices in the organization of the project:

  • Scripting: As much as possible should be scripted with Python to guarantee that we could easily update, re-generate, calibrate and set-up the models (in FEWS) and work with our partner. The model generation scripts developed in this project are the precursors of the HydroMT-Delft3D FM 1D2D plugin scripts.
  • Workflow manager: In order to manage the different workflows, such as building the models, calibrating the models and exporting the models to FEWS, several Snakemake workflows were applied.
  • Data and project storage: SVN are used as VCS in this project. Our partner could easily add data during their work day, and we could continue with it. On the other hand, we were able to share our script updates with them. Disadvantages of SVN in this project was the handling of shapefiles. Shapefiles consist of a collection of files with different extensions. Dependent on the change made certain files are changed in the shapefile, whereas others are not. 

These choices gave that the project could be carried out together with our partner and that we were able to train them.


  • No labels