Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

What

nameofinstance.xml

Description

Configuration for the Secondary Validation module

schema location

http

https://

fews

fewsdocs.

wldelft

deltares.nl/schemas/version1.0/secondaryValidation.xsd

Entry in ModuleDescriptors

Code Blockxmlxml

<moduleDescriptor id="

SecondaryValidation

...

SecondaryValidation (since 2010_01)

The SecondaryValidation module can be used to perform certain checks on time series data and generate log messages when the specified criteria are met.

Children Display

General overview of this page

Table of Contents

Configuration

An XML file for configuring an instance of the SecondaryValidation module called for example CheckImportedData would be the following:

CheckImportedData 1.00 default.xml

CheckImportedData

File name for the CheckImportedData configuration.

1.00

Version number

default

Flag to indicate the version is the default configuration (otherwise omitted).

A SecondaryValidation configuration file is typically located in the ModuleConfigFiles folder and can be used to configure one or more checks. The configured checks will be processed one by one in the specified order. The checks can generate log messages, which can trigger actions in the master controller, like e.g. sending warning e-mails. A special type of check is available for automatically modifying flags to 'doubtful' or 'unreliable' per time step when a condition on multiple time series becomes true.

...

  • minNumberOfValuesCheck: Logs a message when there are not enough values within a configured period. This check is only useful for nonequidistant timeseries because also missing values are counted. In case of equidistant timeseries simply the number of timesteps in the relative viewperiod are returned.
  • minNonMissingValuesCheck: Logs a message when there are not enough non-missing values within a configured period. A non-missing value is a value that is reliable, doubtful or unreliable.
  • minReliableOrDoubtfulValuesCheck: Logs a message when there are not enough values that are reliable or doubtful within a configured period.
  • minReliableValuesCheck: Logs a message when there are not enough reliable values within a configured period.
Check for setting flags per time step using an expression

The seriesComparisonCheck check is available for testing constraints between multiple time series or parameters per time step.
This check verifies constraints between multiple time series sets or multiple parameters and automatically modifies the flags per time step when the required input data was available (reliable or doubtful) and the specified expression is evaluated and is true.

Check for setting flags per time step using other timeseries

The flagsComparisonCheck check is available for comparing and setting flags for multiple time series per time step.
This check determines for each timestep the most unreliable input flag within the input flags, and if it is more unreliable than the output flag it updates the output flag.

Check for setting flags per time step using comparisons with neighbouring locations

The spatialHomogeneityCheck check is available for comparing and setting flags for multiple time series per time step.
This check determines for the selected output timeseries at each timestep the mean error with neighbouring locations. If the mean error exceeds the specified absolute threshold or exceeds the specified relative factor times the standard deviation, then the output flag is updated to the selected output flag whenever that new flag is more unreliable than the existing flag.

Check for detecting trends

The mannKendallCheck check is available for detecting trends. This can be useful for monitor sensors on drift.

Variable Definitions

The configuration contains variable definitions for one or more time series that can be used as input for checks. Each variable definition contains a variableId and a timeSeriesSet. The variableId can be used to reference the time series in a check. Alternatively, depending on which check it is, either variable definitions or variables can be embedded in the checks.

...

Contents of checks for generating counting reliable, doubtful, unreliable and missing values

...

The minNumberOfValuesCheck, minNonMissingValuesCheck, minReliableOrDoubtfulValuesCheck and minReliableValuesCheck all consist of the following elements:

  • id: Identifier of the check. This is only used in log messages and exception messages.
  • variable: One or more time series that need to be checked. This can be either an embedded timeSeriesSet or a reference to a variableDefinition defined at the start of the configuration file. If this contains multiple time series (e.g. for multiple locations), then each time series is checked individually.
  • checkRelativePeriod: The check will only consider data in this time period. This time period is relative to the timeZero of the taskrun in which the module instance runs. The start and end of the period are included. This period overrules any relativeViewPeriods specified in the timeSeriesSets of the time series.
  • minNumberOfValues: The minimum required number of values in the time series to pass the check.
  • logLevel: Log level for the log message that is logged if a time series does not pass the check. Can be DEBUG, INFO, WARN, ERROR or FATAL. If level is error or fatal, then the module will stop running after logging the first log message.
  • logEventCode: Event code for the log message that is logged if a time series does not pass the check. This event code has to contain a dot, e.g. "TimeSeries.Check", because the log message is only visible to the master controller if the event code contains a dot.
  • logMessage: Log message that is logged if a time series does not pass the check. It is possible to use the tag %HEADER% in the logMessage. The %HEADER% tag will be replaced with the header of the time series.

...

Contents of check for setting flags per time step

...

  • id: identifier of the check.
  • variableDefinition: embedded variable definition (see above).
  • expression: A comparison between one or more variableIds (see examples below).
  • validatingVariableId: One or more identifiers for variables for which the flags have to be modified.
  • outputFlag: New flag value for time steps for which there is valid data and the expression fails. Either doubtful or unreliable.
  • logLevel: Log level for the log message that is logged if a time series does not pass the check. Can be DEBUG, INFO, WARN, ERROR or FATAL. If level is error or fatal, then the module will stop running after logging the first log message. Fatal should never be used actually.
  • logEventCode: Event code for the log message that is logged if a time series does not pass the check. This event code has to contain a dot, e.g. "TimeSeries.Check", because the log message is only visible to the master controller if the event code contains a dot.
  • logMessage: Log message that is logged if a time series does not pass the check. Some more options are available than in the other checks:

Tag

Replacement

%AMOUNT_CHANGED_FLAGS%

The number of flags that has been altered.

%OUTPUT_FLAG%

The flag that has been set.

%CHECK_ID%

The id of the check that caused the flags to be altered.

%EXPRESSION%

The expression that caused the flags to be altered.

%LOCATION_ID%

The locationId where the alterations took place.

%PARAMETER_ID%

The parameterId where the alterations took place.

It is not possible to compare two different location sets both containing more than one location id, but the following comparisons can be configured:

  • one location with a scalar
  • all the locations in a location set with a scalar
  • two different locations
  • one location with all the locations in a location set
  • two similar locationSets, containing exactly the same location ids

Configuration example for checks that generate log events

...


<?xml version="1.0" encoding="UTF-8"?>
<secondaryValidation xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/secondaryValidation.xsd">
          <variableDefinition>
		<variableId>input1</variableId>
		<timeSeriesSet>
			<moduleInstanceId>MinReliableValuesCheckTest</moduleInstanceId>
			<valueType>scalar</valueType>
			<parameterId>H.obs</parameterId>
			<locationId>location1</locationId>
			<timeSeriesType>external historical</timeSeriesType>
			<timeStep unit="minute" multiplier="15"/>
			<!-- any relativeViewPeriod here will always be overruled by checkRelativePeriod in each check -->
			<readWriteMode>read only</readWriteMode>
		</timeSeriesSet>
	</variableDefinition>
	<variableDefinition>
		<variableId>input2</variableId>
		<timeSeriesSet>
			<moduleInstanceId>MinReliableValuesCheckTest</moduleInstanceId>
			<valueType>scalar</valueType>
			<parameterId>H.obs</parameterId>
			<locationId>location2</locationId>
			<timeSeriesType>external historical</timeSeriesType>
			<timeStep unit="minute" multiplier="15"/>
			<!-- any relativeViewPeriod here will always be overruled by checkRelativePeriod in each check -->
			<readWriteMode>read only</readWriteMode>
		</timeSeriesSet>
	</variableDefinition>

	<minNonMissingValuesCheck id="MinNonMissingValuesCheck">
		<variable>
			<variableId>input1</variableId>
		</variable>
		<variable>
			<variableId>input2</variableId>
		</variable>
		<checkRelativePeriod unit="hour" start="-12" end="0"/>
		<minNumberOfValues>18</minNumberOfValues>
		<logLevel>INFO</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>Not enough values available for time series %header%</logMessage>
	</minNonMissingValuesCheck>

        <minNumberOfValuesCheck id="MinNumberOfValuesCheck">
		<variable>
			<variableId>input1</variableId>
		</variable>
		<variable>
			<variableId>input2</variableId>
		</variable>
		<checkRelativePeriod unit="hour" start="-12" end="0"/>
		<minNumberOfValues>24</minNumberOfValues>
		<logLevel>DEBUG</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>Not enough values available for time series %header%</logMessage>
	</minNumberOfValuesCheck>

        <minReliableOrDoubtfulValuesCheck id="MinReliableOrDoubtfulValuesCheck">
		<variable>
			<variableId>input1</variableId>
		</variable>
		<variable>
			<variableId>input2</variableId>
		</variable>
		<checkRelativePeriod unit="hour" start="-12" end="0"/>
		<minNumberOfValues>12</minNumberOfValues>
		<logLevel>WARN</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>Not enough values available for time series %header%</logMessage>
	</minReliableOrDoubtfulValuesCheck>

	<minReliableValuesCheck id="MinReliableValuesCheck">
		<variable>
			<variableId>input1</variableId>
		</variable>
		<variable>
			<variableId>input2</variableId>
		</variable>
		<checkRelativePeriod unit="hour" start="-12" end="0"/>
		<minNumberOfValues>6</minNumberOfValues>
		<logLevel>WARN</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>Not enough values available for time series %header%</logMessage>
	</minReliableValuesCheck>
</secondaryValidation>

Configuration examples for check that sets flags per time step

The expression is always a comparison. The comparison operator is within XML one of (.ne., .eq., .gt., .ge., .lt., .le.). Each variable has to be a single word without spaces. Mathematical symbols or functions like e, pi or cos cannot be used as variableId, but they will be interpreted mathematically.

A simple configuration example for the seriesComparisonCheck is given below, it will make the workflow check the values that are reliable or doubtful, and mark them as unreliable if they are smaller than thirteen:

...


<?xml version="1.0" encoding="UTF-8"?>
<secondaryValidation xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/secondaryValidation.xsd">
    <!-- comparison between location variable and scalar, set to unreliable -->
	<seriesComparisonCheck id="checkWithScalar">
		<variableDefinition>
			<variableId>H_obs_location1</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheck</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs</parameterId>
				<locationId>location1</locationId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>
		<expression>H_obs_location1 .ge. 13</expression>
		<validatingVariableId>H_obs_location1</validatingVariableId>
		<outputFlag>unreliable</outputFlag>
		<logLevel>INFO</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>%AMOUNT_CHANGED_FLAGS% flags set to %OUTPUT_FLAG% by [%CHECK_ID%, %EXPRESSION%].</logMessage>
    </seriesComparisonCheck>
</secondaryValidation>

A more complex sample does a comparison for different parameters in similar location sets, it will mark values that were reliable or doubtful as unreliable,
in this case first for location1 and then for location2, when the difference between them is bigger than three:

...


<?xml version="1.0" encoding="UTF-8"?>
<secondaryValidation xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/secondaryValidation.xsd">

<!-- comparison of variables with similar location sets, different parameters, does comparison per location  -->
	<seriesComparisonCheck id="similarLocationSetSeriesComparisonCheck">
		<!-- referred to by locationset1 and locationset2-->
		<variableDefinition>
			<variableId>H_obs1_location1</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs1</parameterId>
				<locationId>location1</locationId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>

		<!-- referred to by locationset1 and locationset2-->
		<variableDefinition>
			<variableId>H_obs1_location2</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs1</parameterId>
				<locationId>location2</locationId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>
		<!-- referred to by locationset1 and locationset2-->
		<variableDefinition>
			<variableId>H_obs2_location1</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs2</parameterId>
				<locationId>location1</locationId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>
		<!-- referred to by locationset1 and locationset2-->
		<variableDefinition>
			<variableId>H_obs2_location2</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs2</parameterId>
				<locationId>location2</locationId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>

		<variableDefinition>
			<variableId>locationSet1</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs</parameterId>
				<locationSetId>locationset1</locationSetId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>

		<variableDefinition>
			<variableId>locationSet2</variableId>
			<timeSeriesSet>
				<moduleInstanceId>SeriesComparisonCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.obs</parameterId>
				<locationSetId>locationset2</locationSetId>
				<timeSeriesType>external historical</timeSeriesType>
				<timeStep unit="minute" multiplier="15"/>
				<relativeViewPeriod unit="day" start="-30" end="0"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>

		<expression>abs(locationSet1 - locationSet2) .gt. 3</expression>
		<validatingVariableId>locationSet1</validatingVariableId>
		<validatingVariableId>locationSet2</validatingVariableId>
		<outputFlag>unreliable</outputFlag>
		<logLevel>INFO</logLevel>
		<logEventCode>TimeSeries.Check</logEventCode>
		<logMessage>%AMOUNT_CHANGED_FLAGS% flags set to %OUTPUT_FLAG% by %CHECK_ID%.</logMessage>
	</seriesComparisonCheck>
</secondaryValidation>

Warning
titleMultiple checks for a time series

When multiple Secondary Validation checks that could change data for the same time series are necessary within 1 workflow, all checks should be present within 1 and the same module config file.

This is because at the beginning of the Secondary Validation module all previous secondary validations are being undone for consistent behaviour and for significant database performance reasons.


Warning
titleBehavioural changes in validating simulated time series created in previous workflow

Since 2019.02 #92895 there have been fixes in the writing of output time series in the Secondary Validation module. These fixes improved consistency in writing behaviour making sure repeated runs will result in the same outcome.

This, however, resulted in significant different outcome for some very specific use cases involving simulated time series. In case simulated time series were being validated as output time series in a workflow that did not create them, it results in the original data being hidden since the new write action creates a newer module run instance for the data. Changing simulated data in a different workflow than it was created should not have been allowed, but because of the previous inconsistencies in writing behaviour it was possible to use this data as output for the secondary validation module. In order to still support validation of simulated data in a different workflow than it was created, a "logs only" mode has been introduced. This mode can be considered as "read only" and will be described below.


Logs only (read only) mode 

This mode has been introduced to enable validation of simulated time series that have been created in a previous workflow.

When simulated time series created in a previous workflow are being validated outside of the "logs only" mode it results in the original data being hidden since the new write action creates a newer module run instance for the data resulting in hiding of the original data.

It lets the Secondary Validation module know that there will be no data changes and therefore previous Secondary Validation does not need to be undone which normally is required for consistent and repeatable behaviour.

The Secondary Validation module will at the start check whether there in any check configured that could possibly change the data and therefore requires a write action.

As long as if it is certain there will not be any write actions then the "logs only" mode will be enabled.

  • minNumberOfValuesCheck: never required write actions
  • minNonMissingValuesCheck: never required write actions
  • minReliableOrDoubtfulValuesCheck: never required write actions
  • minReliableValuesCheck: never required write actions
  • seriesComparisonCheck: requires write actions when there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when <outputMode>logs_only</outputMode> is configured
  • flagsComparisonCheck: requires write actions when there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when <outputMode>logs_only</outputMode> is configured
  • spatialHomogeneityCheck: requires write actions when in ANY of the thresholds there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when in ALL of the thresholds <outputMode>logs_only</outputMode> is configured
  • mannKendallCheck: requires write actions when in ANY of the thresholds there is no <outputMode> defined or <outputMode> is flags_and_logs (default). Does not require write action when in ALL of the thresholds <outputMode>logs_only</outputMode> is configured
  • flagPersistencyCheck: ALWAYS requires write actions, when this check is present in the Secondary Validation module, it will NEVER go into logs_only mode

...