Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What

A workflow.xml

Required

no

Description

Definition of sequence of moduleInstances (or workflows) in logical order

schema location

http://fews.wldelft.nl/schemas/version1.0/workflow.xsd

Table of Contents

Introduction

Workflows are used in Delft-FEWS to define logical sequences of running forecast modules. The workflow itself simply defines the sequence with which the configured modules are to be run. There is no inherent information nor constraints within the workflow on the role the module has in delivering the forecasting requirement.

...

All workflows are defined in the Workflows section of the configuration; when working from a filesystem this is the WorkflowFiles directory. Each workflow will have the same structure and must adhere to the same XML schema definition. Workflows are identified by their name, which are registered to the system through the workflowDescriptors configuration in the Regional Configuration section.

Workflows

Workfows defined may either be available from the Workflows table – when the configuration is loaded into the database – or available in the WorkflowFiles directory when the configuration is available on the file system.

...

  • Simple activities
  • Activities with a fallback activity;
  • Activities to be run as an ensemble.
  • Parallel activities, i.e. a group of activities that can be executed in parallel when multiple cores are available
  • Sequence activities, i.e. a group of activities that have to be executed in sequence

Activities

Each activity can either be a single moduleInstance or another workflow.

...

Figure 143  Elements of an Activity in the workflow.

activity

Root element for the definition of a workflow activity. Multiple entries can be defined.

runIndependent

Boolean flag to indicate if the activity is considered to be independent of other activities in the workflows. If the flag is set to "false" (default) then the failure of this activity will cause the complete workflow being considered as having failed. No further activities will be carried out. If the flag is set to "true", then failure of an activity will not cause the workflow to fail. The next activity in the workflow will also be attempted. An indication is given to the user in the Forecast Management display if one or more workflow activities have failed.

enabled

This feature is added since FEWS 2017.02 and can be used to exclude an activity from a workflow run. The enabled element contains a locationId and a location attributeId. The location attribute Id must be a boolean attribute that can be TRUE or FALSE. When the attribute value is TRUE the activity will be run, when the value is FALSE the activity will not be run. The idea behind this feature is that the location attribute can be changed with a modifier, enabling the forecaster to exclude activities (modules or sub-workflows) from a workflow run.

...

Code Block
languagexml
	<activity>
		<enabled locationId="CAP" attributeId="EXPORT"/>
		<moduleInstanceId>Export_CAP</moduleInstanceId>
		<description>Export CAP messages</description>
	</activity>
</workflow>



Description

Optional field. If configured, the text will be displayed in the workflow navigator tree as mouse over label (tooltip).

...

Optional element; Flag source that is used to write to the flagSourceColumn in an activity that is able to write a flag. Used for non missing values when no other flagSource is applied. The defaultFlagSource will be written in case no flag is changed and data is non missing. Typically this defaultFlagSource is set to be 'OK', and configured in CustomFlagSources configuration file.

flagSourceColumnId

Optional element; This will dictate which flagSourceColumn that will be used to write the flagSources to.   A validation step where a flagSourceColumnId has been defined will write the appropriate flagSource in case of a flag change, both to the flagSource column and the regular flagSource element. In case the flag is not changed, the defaultflagSource will be written to the flagSource column. In this case, the flagSource will only be written to the regular flagSource element when this is empty. The flagSourceColumnId needs to be defined in  flagSourceColumn.xml configuration file .

moduleInstanceId

The ID of the moduleInstance to run as the workflow activity. This module instance must be defined in the moduleInstanceDescriptors (see Regional Configuration) and a suitable module configuration file must be available (see Module configurations). This moduleInsatnceId is used to administer the data generated.

moduleConfigFileName

Optional reference to a module config file that holds the processing logic for a module instance. Typically this module config file holds property-keys such that it can acts as template for multiple module instances, each module instacne having a unique set of property-values.

workflowId

This is the name of the subworkflow xml file and NOT the workflow descriptor id as the name suggests. The workflow descriptors (including the workflow descriptor properties) are ignored when resolving the subworkflow. There is no need to add a subworkflow xml file to the workflow descriptors when it is only used as a subworkflow.  In case a workflow is defined in de the workflowDescriptors by using a template workflow config file (in combination with properties), you cannot refer to such a workflowId, but should use the template file as workflowId and use properties.

properties

Root element for the definition of one or more properties. Multiple entries can be defined. Properties can be externalize portions of a timeSeriesSet definitions (e.g. locationSetId) or parameter values from the module configuration file to the level of the workflow definition.



properties:string

property definition for a string data type.  Holds a key-attribute and a value-attribute, where the attribute value is replacing the $key$ as defined in a module config file.

properties:int

property definition for a integer data type.  Holds a key-attribute and a value-attribute, where the attribute value is replacing the $key$ as defined in a module config file.

properties: float

property definition for a float data type.  Holds a key-attribute and a value-attribute, where the attribute value is replacing the $key$ as defined in a module config file.

properties:double

property definition for a double data type.  Holds a key-attribute and a value-attribute, where the attribute value is replacing the $key$ as defined in a module config file.

properties:bool

property definition for a boolean data type.  Holds a key-attribute and a value-attribute, where the attribute value is replacing the $key$ as defined in a module config file.

properties:dateTime

property definition for a dateTime data type. Holds a key-attribute while the value is based on the combination of the date-attribute and the time-attribute.

fallbackActivity

A fallback activity may be defined to run if the activity under which it is defined fails. This can be used to run a simple module if the more complex method used in preference fails, and ensures continuity of the forecasting process. The definition of the fallbackActivity is the same as the definition of an activity. A fallback activity cannot be used to turn a partly completed (completed with errors) TaskRun into a fully successful TaskRun (complete without errors).

ensemble

This element is defined to indicate the activity is to run as an ensemble.

ensembleId

Id of the ensemble to apply in retrieving data from the database. For all time series sets used as input for the activities running as an ensemble, a request for time series with this Id defined will be made. Ensemble id's in a sub workflow will override this ensembleId. A sub workflow without an ensembleId will make use of this ensembleId

runInLoop

Boolean flag to indicate if the activity should run as many times as there are members in the ensemble, or if it is to be run only once, but will use all members of the ensemble in that single run. If the value is set to "true", then when running the workflow DELFT-FEWS will first establish how many members there are in the ensemble and then run the activity for each member. If the value is set to "false" then the activity will be run only once. On requesting a time series set within the modules to be run, the database will return all members of that ensemble. When true the $LOOP_ENSEMBLE_MEMBER_ID$ can be used in the module config files. This property is not needed for the time series sets because the loop member is the default ensemble selection for a time series set while runInloop is enabled.

ensembleMemberId

Since 2018.01 (for older versions, configure an ensembleMemberIdRegularExpression with the wanted ensembleMemberId as a "pattern"). Optional field to only run one particular ensemble member. If this field is used only the specified ensemble member will be run. 

ensembleMemberIndex

Optional field to only run one particular ensemble member. If this field is used only the specified ensemble member will be run.

ensembleMemberIndexRange

Optional field to run a particular range of ensemble members. Processing starts at member start and ends at member end. If end is not specified the processing will start at start and end at the last member of the ensemble.

ensembleMemberIdRegularExpression

Runs all members which id matches the specified expression  e.g specify ^Q.* to loop over all members that start with Q. 

...

Info

When running activities as an ensemble that request time series sets from the database that are not a part of that ensemble, the default ensembleId should be added to the TimeSeriesSets definition. The default ensemble Id is "main".

All time series sets written when running in ensemble mode will have the ensembleId as specified in the workflow ensembleId element, unless overruled by a local ensembleId defined in the timeSeriesSet on writing.

loopLocationSet 

Like looping over an ensemble it is possible to loop over and location set. $LOOP_LOCATION_ID$ can be used inside the module configuration to get the active location id. When the configured loopLocationSet does not exist a config error is thrown. With skipNonExistingLoopLocationSet the workflow activity can be skipped is this case, preventing the error.

workflow:parallel execution

Grouping to accommodate parallelization of (a portion) of the workflow when multiple CPU available. As explained on 19 Parallel running of ensemble loops and activities on one forecasting shell instance, one can conduct parallelization on one forecasting shell instance by defining the global.property runInLoopParallelProcessorCount. Since FEWS 2017.01 it is also possible to run the parallel activities on multiple forecasting shells. Parallel activities should only be configured when the underlying activities have no data dependency. If data dependency exists, activities should be executed in sequence. This may be forced by embedding them as part of a sequence group.  

...

Info
title2017.02 Improvement

In 2017.02 fss patch #92224 provides an improvement / bugfix that only applies to the multipleForecastingShells=true option in 2017.02. With this improvement, a partly successful forecast partition in 2017.02 should not cause termination of the other partitions.

workflow:sequence

Grouping to enforce sequential execution of multiple activities as they generally have data dependencies. Typically used in workflows where parallel activities are defined to clarify which activities have be to executed in sequence.

...

Code Block
languagexml
titleSub-workflow using properties defined at top-level workflow
<workflow xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews
http://fews.wldelft.nl/schemas/version1.0/workflow.xsd" version="1.1">
    <parallel>
        <activity logStartedAsDebug="true" logFinishedAsDebug="true">
            <runIndependent>false</runIndependent>
            <moduleInstanceId>RTCTools_KOT_Check_$RUNTYPE$</moduleInstanceId>
            <moduleConfigFileName>RTCTools_Check_Qinc_Fx</moduleConfigFileName>
        </activity>
        <activity logStartedAsDebug="true" logFinishedAsDebug="true">
            <runIndependent>false</runIndependent>
            <moduleInstanceId>RTCTools_KOT_Check_$RUNTYPE$</moduleInstanceId>
            <moduleConfigFileName>RTCTools_Check_QO_Seed</moduleConfigFileName>
        </activity>
        <activity logStartedAsDebug="true" logFinishedAsDebug="true">
            <runIndependent>false</runIndependent>
            <moduleInstanceId>RTCTools_KOT_Check_$RUNTYPE$</moduleInstanceId>
            <moduleConfigFileName>RTCTools_Check_FB</moduleConfigFileName>
        </activity>
    </parallel>
</workflow>

loopLocationSetId

For a workflow, a location-loop can be defined in which each location (site) in a location set is run separately. The $LOOP_LOCATION_ID$ tag in the configuration can be used to point to the corresponding locationID of the current location. This is useful in combination with (related location) Constraints, because then it is possible to filter a locationSet on the basis of the $LOOP_LOCATION_ID$.

...

The locationLoop also works in combination with "run for selected locations", this will only run for the locations that are in the locationSet AND selection



completed (since 2021.01)

At the end of a sub-workflow the simulated time series for one or multiple module instances can be made visible for other workflows and users with an Operator Client before the workflow completes successfully. This can be achieved by adding a "completed" element at the end of a sub-workflow. 

...

See deleteTemporary if you want to transfer temporary timeseries to the next partition forecasting shell.


deleteTemporary


Explicit deletes temporary series. At the end workflow partition or the end of the workflow temporary series are automatically deleted. Deleting specified module
instances earlier prevents the temporary series are flushed to disk for no use when they are still in the memory buffer. Time series that are explicitly deleted in another
workflow partition then the partition were they were created are always flushed to the database. This makes it possible to use temporary time series even when the workflows
continues on a different forecasting shell. When the workflow terminates or completes all temporary series for the run are deleted from the database

...