You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

Cookbook reproducible modeling (version 0.1)

Be aware: The cookbook presented below is a draft version of how to make your numerical project reproducible.

  • Scripting

    Many data processing steps are usually required to process source data to model input. For example, converting a shapefile to a grid, correcting mistakes in the source data, or prescribing parameters to the model based on the source data. These steps can be performed in graphical user interfaces (GUI) or scripts. Actions in a GUI are nearly always unrecorded, meaning mistakes are usually hard to identify and rectifying mistakes is costly, as it entails manually performing all steps again. In contrast, a script is an explicit testimony of all actions and rectifying mistakes involves just fixing the specific line of code and re-running.

    Scripting is therefore essential in creating reproducible workflows.  Python is a common applied scripting language within Deltares (next to Matlab).

    In this chapter we will give a quick introduction to Python and point to some more elaborate guides.

  • Folder structure of the project

Establishing a well-organized project structure is vital for ensuring reproducibility in your numerical modeling projects. A clear and consistent structure not only improves readability but also simplifies navigation, facilitates collaboration, and enhances the overall reproducibility of your work. In this chapter, we will delve into the following aspects of project structure:

    • Consistent folder structure
    • Readable scripts
    • Documentation
    • File formats and data organization

  • Version of your code and data

    Since scripting becomes increasingly important in our projects, the rules of software development also start to apply to us. This has the downside that it requires extra effort from us, hydrologists, to learn new tools, but the upside is that there already is a wealth of properly tested tools and documentation available from the software development world. Having reproducible code is one of these important things, for which version control systems have been developed. It is very useful to use these systems in our projects since they provide the following advantages:

    You keep track of the history of a project. In this way you keep a journal of decisions made in a project.
    If you mess up something, you can always revert to a previous state.
    It allows for collaborative development, where individuals can create their own branch to safely work on new developments and later merge their changes.

    In this chapter we will give a quick introduction to versioning and point to some more elaborate guides.

  • Manage the worklow
    A collection of scripts doesn't make a project reproducible per se. Quite often the output of script A serves as the input of script B. The order in which you execute scripts therefore matters: after a data update, first script A needs to be run, after which script B needs to be run, while an independent script C does not need to be re-run. Running script B before script A would not result in an update of the end result. Data dependencies between scripts can get complex quickly when multiple data sources exist, which commonly happens in projects. Managing workflows therefore is a common task in your numerical modeling project but not always as straightforward as it seems. Workflow managers can help with this. 


  • No labels