Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

What

nameofinstance.xml

Description

Configuration for the Secondary Validation module

schema location

http

https://

fews

fewsdocs.

wldelft

deltares.nl/schemas/version1.0/secondaryValidation.xsd

Entry in ModuleDescriptors

Code Blockxmlxml<moduleDescriptor id="SecondaryValidation"> <description>SecondaryValidation</description> <className>nl.wldelft.fews.system.plugin.secondaryValidation.SecondaryValidation</className> </moduleDescriptor>

SecondaryValidation

...

The SecondaryValidation module can be used to perform certain checks on time series data and generate log messages when the specified criteria are met.

Children Display

General overview of this page

Table of Contents

Configuration

An XML file for configuring an instance of the SecondaryValidation module called for example CheckImportedData would be the following:

CheckImportedData 1.00 default.xml

CheckImportedData

File name for the CheckImportedData configuration.

1.00

Version number

default

Flag to indicate the version is the default configuration (otherwise omitted).

A SecondaryValidation configuration file is typically located in the ModuleConfigFiles folder and can be used to configure one or more checks. The configured checks will be processed one by one in the specified order. The checks can generate log messages, which can trigger actions in the master controller, like e.g. sending warning e-mails. A special type of check is available for automatically modifying flags to 'doubtful' or 'unreliable' per time step when a condition on multiple time series becomes true.

...

  • minNumberOfValuesCheck: Logs a message when there are not enough values within a configured period. This check is only useful for nonequidistant timeseries because also missing values are counted. In case of equidistant timeseries simply the number of timesteps in the relative viewperiod are returned.
  • minNonMissingValuesCheck: Logs a message when there are not enough non-missing values within a configured period. A non-missing value is a value that is reliable, doubtful or unreliable.
  • minReliableOrDoubtfulValuesCheck: Logs a message when there are not enough values that are reliable or doubtful within a configured period.
  • minReliableValuesCheck: Logs a message when there are not enough reliable values within a configured period.
Check for setting flags per time step using an expression

The seriesComparisonCheck check is available for testing constraints between multiple time series or parameters per time step.
This check verifies constraints between multiple time series sets or multiple parameters and automatically modifies the flags per time step when the required input data was available (reliable or doubtful) and the specified expression is evaluated and is true.

Check for setting flags per time step using other timeseries

The flagsComparisonCheck check is available for comparing and setting flags for multiple time series per time step.
This check determines for each timestep the most unreliable input flag within the input flags, and if it is more unreliable than the output flag it updates the output flag.

Check for setting flags per time step using comparisons with neighbouring locations

The spatialHomogeneityCheck check is available for comparing and setting flags for multiple time series per time step.
This check determines for the selected output timeseries at each timestep the mean error with neighbouring locations. If the mean error exceeds the specified absolute threshold or exceeds the specified relative factor times the standard deviation, then the output flag is updated to the selected output flag whenever that new flag is more unreliable than the existing flag.

Check for detecting trends

The mannKendallCheck check is available for detecting trends. This can be useful for monitor sensors on drift.

Variable Definitions

The configuration contains variable definitions for one or more time series that can be used as input for checks. Each variable definition contains a variableId and a timeSeriesSet. The variableId can be used to reference the time series in a check. Alternatively, depending on which check it is, either variable definitions or variables can be embedded in the checks.

...

Contents of checks for counting reliable, doubtful, unreliable and missing values

...

The minNumberOfValuesCheck, minNonMissingValuesCheck, minReliableOrDoubtfulValuesCheck and minReliableValuesCheck all consist of the following elements:

  • id: Identifier of the check. This is only used in log messages and exception messages.
  • variable: One or more time series that need to be checked. This can be either an embedded timeSeriesSet or a reference to a variableDefinition defined at the start of the configuration file. If this contains multiple time series (e.g. for multiple locations), then each time series is checked individually.
  • checkRelativePeriod: The check will only consider data in this time period. This time period is relative to the timeZero of the taskrun in which the module instance runs. The start and end of the period are included. This period overrules any relativeViewPeriods specified in the timeSeriesSets of the time series.
  • minNumberOfValues: The minimum required number of values in the time series to pass the check.
  • logLevel: Log level for the log message that is logged if a time series does not pass the check. Can be DEBUG, INFO, WARN, ERROR or FATAL. If level is error or fatal, then the module will stop running after logging the first log message.
  • logEventCode: Event code for the log message that is logged if a time series does not pass the check. This event code has to contain a dot, e.g. "TimeSeries.Check", because the log message is only visible to the master controller if the event code contains a dot.
  • logMessage: Log message that is logged if a time series does not pass the check. It is possible to use the following tags in the logMessage: %HEADER% and %LOCATION_NAME%. The %HEADER% tag will be replaced with the header of the time series. The %LOCATION_NAME% tag will be replaced with the name of the location of the time series.

Tag

Replacement

%HEADER%

The name of the time series.

%LOCATION_NAME%

The location name of the time series.

...

Contents of check for seriesComparisonCheck

...

  • id: identifier of the check.
  • variableDefinition: embedded variable definition (see above).
  • expression: A comparison between one or more variableIds (see examples below).
  • validatingVariableId: One or more identifiers for variables for which the flags have to be modified.
  • outputFlag: New flag value for time steps for which there is valid data and the expression fails. Either doubtful or unreliable.
  • logLevel: Log level for the log message that is logged if a time series does not pass the check. Can be DEBUG, INFO, WARN, ERROR or FATAL. If level is error or fatal, then the module will stop running after logging the first log message. Fatal should never be used actually.
  • logEventCode: Event code for the log message that is logged if a time series does not pass the check. This event code has to contain a dot, e.g. "TimeSeries.Check", because the log message is only visible to the master controller if the event code contains a dot.
  • logMessage: Log message that is logged if a time series does not pass the check. Some more options are available than in the other checks:

Tag

Replacement

%AMOUNT_CHANGED_FLAGS%

The number of flags that has been altered.

%CHECK_ID%

The id of the check that caused the flags to be altered.

%EXPRESSION%

The expression that caused the flags to be altered.

%HEADER%

The header names of the timeseries for which the flags were altered.

%LOCATION_ID%

The locationId where the alterations took place.

%LOCATION_NAME%

The name of the locations where the alterations took place.

%OUTPUT_FLAG%

The flag that has been set.

%PARAMETER_ID%

The parameterId where the alterations took place.

%PARAMETER_NAME%

The name of the parameter where the alterations took place.

%PERIOD%

The period in which flags were changed.

It is not possible to compare two different location sets both containing more than one location id, but the following comparisons can be configured:

  • one location with a scalar
  • all the locations in a location set with a scalar
  • two different locations
  • one location with all the locations in a location set
  • two similar locationSets, containing exactly the same location ids

Configuration example for checks on amounts of reliable, doubtful, unreliable and missing values

...

Warning
titleMultiple checks for a time series

When multiple Secondary Validation checks that could change data for the same time series are necessary within 1 workflow, all checks should be present within 1 and the same module config file.

This is because at the beginning of the Secondary Validation module all previous secondary validations are being undone for consistent behaviour and for significant database performance reasons.


Warning
titleBehavioural changes in validating simulated time series created in previous workflow

Since 2019.02 #92895 there have been fixes in the writing of output time series in the Secondary Validation module. These fixes improved consistency in writing behaviour making sure repeated runs will result in the same outcome.

This, however, resulted in significant different outcome for some very specific use cases involving simulated time series. In case simulated time series were being validated as output time series in a workflow that did not create them, it results in the original data being hidden since the new write action creates a newer module run instance for the data. Changing simulated data in a different workflow than it was created should not have been allowed, but because of the previous inconsistencies in writing behaviour it was possible to use this data as output for the secondary validation module. In order to still support validation of simulated data in a different workflow than it was created, a "logs only" mode has been introduced. This mode can be considered as "read only" and will be described below.


Logs only (read only) mode 

This mode has been introduced to enable validation of simulated time series that have been created in a previous workflow.

When simulated time series created in a previous workflow are being validated outside of the "logs only" mode it results in the original data being hidden since the new write action creates a newer module run instance for the data resulting in hiding of the original data.

It lets the Secondary Validation module know that there will be no data changes and therefore previous Secondary Validation does not need to be undone which normally is required for consistent and repeatable behaviour.

The Secondary Validation module will at the start check whether there in any check configured that could possibly change the data and therefore requires a write action.

As long as if it is certain there will not be any write actions then the "logs only" mode will be enabled.

  • minNumberOfValuesCheck: never required write actions
  • minNonMissingValuesCheck: never required write actions
  • minReliableOrDoubtfulValuesCheck: never required write actions
  • minReliableValuesCheck: never required write actions
  • seriesComparisonCheck: requires write actions when there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when <outputMode>logs_only</outputMode> is configured
  • flagsComparisonCheck: requires write actions when there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when <outputMode>logs_only</outputMode> is configured
  • spatialHomogeneityCheck: requires write actions when in ANY of the thresholds there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when in ALL of the thresholds <outputMode>logs_only</outputMode> is configured
  • mannKendallCheck: requires write actions when in ANY of the thresholds there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when in ALL of the thresholds <outputMode>logs_only</outputMode> is configured
  • flagPersistencyCheck: ALWAYS requires write actions, when this check is present in the Secondary Validation module, it will NEVER go into logs_only mode