Snakemake is a powerful workflow management system that allows you to define and execute complex computational workflows in a reproducible and scalable manner. In this tutorial, we'll walk through the basics of Snakemake and demonstrate how to create and execute a simple workflow.

Warning

Snakemake works very well on local machines and on computational clusters, however it does not have a server handler, which means having multiple people running the same workflow on a server can cause complications. This can for example happen on modelling-as-a service servers. In that case you are advised to use a more involved worfklow manager, such as Argo.


See also

See also the example project on Github: https://github.com/Deltares/FAIR-data-example-project


Prerequisites

Before we begin, make sure you have the following prerequisites installed on your system:

  • Python: Snakemake requires Python 3.5 or higher. You can download Python from the official Python website: Python.org
  • Snakemake: Install Snakemake using conda, a Python package manager. Open a terminal or command prompt and run the following command:
conda install -c bioconda snakemake

Alternatively, Deltaforge is a Deltares python distribution which comes with snakemake. See its documentation for installation instructions

Introduction to Snakemake

Snakemake workflows are defined in a file called Snakefile. In this file, you specify a set of rules, with each rule representing an individual step of your workflow. Rules define input files, output files, and the commands or scripts to be executed.

Let's start by creating a simple workflow using Snakemake.

Creating a Simple Workflow

1. Create a new directory for your workflow and navigate into it:

   mkdir snakemake-tutorial
   cd snakemake-tutorial
   

2. Create a new file called Snakefile using a text editor:

3. Open the Snakefile and add the following content:

   rule all:
       input:
           "reports/figures/head.png"

   rule plot_heads:
       input:
           head_nc = "data/4-output/head.nc",
           script = "src/5-visualize/visualize_heads.py"
       output:
           head_png = "reports/figures/head.png"
       script:
           "src/5-visualize/visualize_heads.py"
   

In this example, we have two rules: all and plot_heads. The all rule specifies the final output file of our workflow, which is "reports/figures/head.png". The plot_heads rule defines the input file, output file, and script to be executed.

4. Save the Snakefile and close the text editor.

Running the Workflow

To run the workflow, open a terminal or command prompt and navigate to the directory where the Snakefile is located ('snakemake-tutorial').

1. Execute the following command:

   snakemake
   

Snakemake will analyze the workflow and execute the necessary steps to generate the final output. In our case, it will run the plot_heads rule, which will create the "reports/figures/head.png" file by executing the specified script.

Congratulations! You've created and executed your first Snakemake workflow.

Handling Parameters

Snakemake allows you to pass parameters to rules without creating separate files. This can be useful when you want to provide inputs or configuration values to your workflow. Let's modify our workflow to demonstrate this feature.

1. Open the Snakefile in your text editor.

2. Modify the plot_heads rule as follows:

   rule plot_heads:
       input:
           head_nc = "data/4-output/head.nc",
       params:
           times = "2012-09-01"
       output:
           head_png = "reports/figures/head.png"
       script:
           "src/5-visualize/visualize_heads.py"
   

In this updated rule, we added a params section that specifies a parameter called times with the value "2012-09-01". This parameter can be accessed within the script using snakemake.params.times.

3. Save the Snakefile and close the text editor.

4. Run the workflow again using the snakemake command:

   snakemake
   

Snakemake will execute the workflow, passing the parameter times to the script.

Executing Workflows

Snakemake provides various options to execute workflows. Here are some commonly used commands:

  • To execute the first rule (default rule), run:

       snakemake
       
  • To execute a specific rule (e.g., plot_heads) and all the rules required to reach it, run:

       snakemake plot_heads
       
  • If you make changes to a rule and want to re-execute all downstream rules, run:

       snakemake -R plot_heads
       

These commands will help you control the execution of your workflow and ensure that only the necessary steps are executed.

That's it! You've completed a basic tutorial on Snakemake. Snakemake offers many more advanced features, such as handling dependencies, working with wildcards, and specifying resources. For more information, refer to the official Snakemake documentation: Snakemake Documentation

Feel free to explore and experiment with Snakemake to manage your complex workflows efficiently. If you have any further questions, feel free to ask!

  • No labels