You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

What is reproducible modeling

Within and outside Deltares, there is currently a push towards reproducible model projects. The goal is to ensure proper quality assurance of projects to leave more time for the actual analysis. In recent yearss, Deltares has embraced Open-Source software with the motto ‘dare to share’, which has been a crucial step towards achieving project reproducibility. In this tutorial, we will explore the essential steps and best practices to ensure that your numerical models are reproducible and easily shareable.

Why Reproducibility Matters

Reproducibility is a cornerstone of research. It allows others to validate and build upon your work, promotes transparency, and fosters collaboration within the scientific community. By following reproducible practices in your numerical modeling projects, you can:

  • Increase the reliability and credibility of your research.
  • Facilitate collaboration and knowledge exchange with colleagues and clients.
  • Enable easier troubleshooting and debugging of your models.
  • Ensure long-term accessibility and usability of your work.

Definition of a reproducible project

To establish a clear understanding, it is important to define what we mean by reproducible projects. A reproducible numerical modeling project is a project which meets the following requirements:

  1. Can reproduce the results of the project from source data with preferably 1 command.
  2. Take the minimal effort to set up on a new machine, regardless of the operating system.
  3. Is easy to understand how the project is set up (e.g., standardized folder structure, readable scripts, documentation)
  4. Keep a log of changes made to the project (scripts and data version control)
  5. Allow easy sharing of the project within Deltares, so projects are easily findable.
  6. Allow easy sharing of data (e.g., generic data formats, meta-data)


Cookbook reproducible modeling (version 0.1)

Be aware: The cookbook presented below is a draft version of how to make your numerical project reproducible.

  • Scripting

    Many data processing steps are usually required to process source data to model input. For example, converting a shapefile to a grid, correcting mistakes in the source data, or prescribing parameters to the model based on the source data. These steps can be performed in graphical user interfaces (GUI) or scripts. Actions in a GUI are nearly always unrecorded, meaning mistakes are usually hard to identify and rectifying mistakes is costly, as it entails manually performing all steps again. In contrast, a script is an explicit testimony of all actions and rectifying mistakes involves just fixing the specific line of code and re-running.

    Scripting is therefore essential in creating reproducible workflows.  Python is a common applied scripting language within Deltares (next to Matlab).

    In this chapter we will give a quick introduction to Python and point to some more elaborate guides.

  • Folder structure of the project

Establishing a well-organized project structure is vital for ensuring reproducibility in your numerical modeling projects. A clear and consistent structure not only improves readability but also simplifies navigation, facilitates collaboration, and enhances the overall reproducibility of your work. In this chapter, we will delve into the following aspects of project structure:

    • Consistent folder structure
    • Readable scripts
    • Documentation
    • File formats and data organization

  • Version of your code and data

    Since scripting becomes increasingly important in our projects, the rules of software development also start to apply to us. This has the downside that it requires extra effort from us, hydrologists, to learn new tools, but the upside is that there already is a wealth of properly tested tools and documentation available from the software development world. Having reproducible code is one of these important things, for which version control systems have been developed. It is very useful to use these systems in our projects since they provide the following advantages:

    You keep track of the history of a project. In this way you keep a journal of decisions made in a project.
    If you mess up something, you can always revert to a previous state.
    It allows for collaborative development, where individuals can create their own branch to safely work on new developments and later merge their changes.


    In this chapter we will give a quick introduction to versioning and point to some more elaborate guides.

  • Manage the worklow
    Managing workflows in your numerical modeling project is a common task but not always as straight forward as it seems.
    In this chapter we will give a quick introduction to using the worklfow manager snakemake and point to some more elaborate guides.



tekst



  • No labels