Contents of check for Mann-Kendall Check

Overwrites the flag of the timeseries with unreliable or doubtful when there are at least ten observations and the Mann-Kendall detects a trend but the value does not follow the trend for the specified confidence balance. This check uses tiedValues and sameYearValues for the calculation.

The purpose of this check is to test for trends in each of the input time series. Once a trend is detected the check will alter the flags of the timeseries concerned to the specified output flag and the specified log message will be generated. One of the strengths of the Mann-Kendall check is that it can also be used when there are lots of missing values. If there are less than 10 non-missing values, the test will be skipped. During the check, the threshold criteria for the check are first sorted so that the most severe log message is processed first. When the log message is generated the less serious log message will not be generated. The trend analysis is done using the formula below. Every value is compared with all the other values in the series and the signs of the differences are counted as 1 or -1.

General information on Mann-Kendall trend test

The MannKendall algorithm first calculates several statistics on the timeseries. Missing values are ignored.

Mann-Kendall statistic S


 

Variance S


 

where

  • g

 

is the number of groups of tied data

  • tp

 

the number of tied data in the p-th group

  • h

 

the number of sampling times that contain multiple data

  • up

 

the number of multiple data in the qth time period

Z statistic
Z=

, if S > 0,

  • Z = 0

, if S = 0

, if S < 0,

Slope

In classical MannKendall tests Slope is the median of the slopes which is a very reliable estimator of the slope even when there are lots of missing values. The calculation of this value is expensive in terms of memory and requires N * (N-1) / 2 memory where N is the number of non-missing input values. In order to prevent memory problems, it is recommended that a limited amount of input values is used. If the option logSensSlope is set to false, the average slope is used to estimate the slope which is less accurate than the median but requires N memory.

Drift

Drift is the duration of the checkRelativePeriod times the estimated slope.

Conditions for rejecting H0 that there is no trend

After the statistics have been calculated and it has been established that there are at least 10 nonmissing values, the following conditions are used to trigger logmessages for the configured trend tests

  • there is a two-tailed trend if zStatistic <= -delta or zStatistic >= delta
  • there is a downward trend if zStatistic <= -delta
  • there is an upward trend if zStatistic >= delta

where delta is defined as

  • delta = inverseCDF(1 - confidenceCoefficient) if two-tailed
  • delta = inverseCDF(1 - confidenceCoefficient / 2) if upward or downward

Configuration

  • id: identifier of the check.
  • checkRelativePeriod: The period to run the trend test for.
  • variableDefinition: Definition of time series. Each time series is processed independently.
  • inputVariable:Identifier of a variable of which timeseries the flags will be used as input (neighbouring locations). Refers to a time series set defined in the variableDefinitions.
  • outputVariable: Identifier of a variable for which timeseries the outputFlag has to be updated in case the thresholds are exceeded (observed values). Refers to a time series set defined in the variableDefinitions.
  • logSensSlope: Includes Sen's slope in the logging (default true). For large number of steps the algorithm for determining Sen's slope requires lots of memory. To resolve out of memory problems, set this option to false. The difference is that no longer the median of the slopes is used to estimate the slope (classical mann kendall), but instead the average of the slopes is used as slope estimator.
  • onErrorResumeNext When true, makes the secondary validation continue when an error logging is applied.

For each threshold,

  • testTrend: either two-tailed, upward or downward (two-tailed is default)
  • logLevel: Log level for the log message that is logged if a trend is detected. Can be DEBUG, INFO, WARN, ERROR or FATAL. If level is error or fatal, then the module will stop running after logging the first log message. Fatal should never be used actually.
  • outputFlag Either unreliable or doubtful.
  • outputMode By default the flags that need updating are updated and log events are generated for the updated flags. When this option is set to 'logs_only', the log events are generated but the flags will not be updated.
  • logEventCode: Event code for the log message that is logged if a trend is detected. This event code has to contain a dot, e.g. "TimeSeries.Check", because the log message is only visible to the master controller if the event code contains a dot.
  • logMessage: Log message that is logged if a trend is detected.

A threshold can be specified by either a maximum drift or a confidenceCoefficient

  • confidenceCoefficient: the confidence coefficient as used in the classical MannKendall check, also known as alpha, which is typically between 0 and 0.5, i.e. 0.05 (one-tailed) and 0.025 (two-tailed) correspond to a confidence level of 95%.
  • maximumDrift: Instead of using the classical MannKendall test using the inverseCDF, this option will alter the flags and generate the logs when the maximum absolute drift is exceeded. Drift is the length of the checkRelativePeriod times the slope.

Tag

Replacement

%AMOUNT_CHANGED_FLAGS%

The amount of output flags that were changed.

%CHECK_ID%

The id of the check that caused the flags to be altered.

%HEADER%

Header name of the timeseries where the alterations took place.

%LOCATION_ID%

The locationId of the timeseries where the alterations took place.

%LOCATION_NAME%

The name of the locations where the alterations took place.

%NONE%

Hides the default tags that are automatically added.

%OUTPUT_FLAG%

The output flag

%PARAMETER_ID%

The parameterId of the timeseries where the alterations took place.

%PARAMETER_NAME%

The name of the parameter where the alterations took place.

%PERIOD%

The period boundaries in which the output flags were changed.

%SLOPE%

Sen's slope estimator, which is the median of all slopes of the non missing values.

%DRIFT%

Sen's slope estimator times the duration of checkRelativePeriod.

Configuration examples for MannKendallCheck

A configuration example for the MannKendallCheck is given below:

<secondaryValidation xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/secondaryValidation.xsd">
        <mannKendallCheck id="MannKendallCheck_nue_015_01">
	<variableDefinition>
		<variableId>input1</variableId>
			<timeSeriesSet>
				<moduleInstanceId>MannKendallCheckTest</moduleInstanceId>
				<valueType>scalar</valueType>
				<parameterId>H.meting</parameterId>
				<locationId>Nue_0015_01_01</locationId>
				<timeSeriesType>simulated forecasting</timeSeriesType>
				<timeStep unit="minute" multiplier="1"/>
				<relativeViewPeriod unit="minute" start="1705491" end="1705501"/>
				<readWriteMode>read only</readWriteMode>
			</timeSeriesSet>
		</variableDefinition>
		<input>
			<variableId>input1</variableId>
		</input>
		<!-- test storage is set to 2009-1-1, data starts at 11-Dec-2011 -->
		<checkRelativePeriod unit="minute" start="1705491" end="1705501"/>
		<threshold>
			<testTrend>two-tailed</testTrend>
			<confidenceCoefficient>0.01</confidenceCoefficient>
			<outputFlag>unreliable</outputFlag>
			<logLevel>WARN</logLevel>
			<logEventCode>SecondaryValidation.MannKendallCheck</logEventCode>
			<logMessage>Two-tailed Mann-Kendall check has detected a trend in %HEADER% by %CHECK_ID%, %AMOUNT_CHANGED_FLAGS% flags set to %OUTPUT_FLAG%.</logMessage>
		</threshold>
	</mannKendallCheck>
</secondaryValidation>

Further reading

The algorithms from the Mann-Kendall check stem from pages 208 onwards Statistical Methods for Environmental Pollution Monitoring by Richard O. Gilbert (PDF).

  • No labels

1 Comment

  1. VAR(S)=\frac{1}{18}(n(n-1)(2n+5)-\sum\limits_{p=1}^{g}t_p(t_p-1)(2t_p+5)-\sum\limits_{q=1}^{h}u_q(u_q-1)(2u_q+5)+

    \frac{\sum\limits_{p=1}^{g}t_p(t_p-1)(2t_p-2)-\sum\limits_{q=1}^{h}u_q(u_q-1)(2u_q-2)}{9n(n-1)(n-2)}+\frac{\sum\limits_{p=1}^{g}t_p(t_p-1)\sum\limits_{q=1}^{h}u_q(u_q-1)}{2n(n-1)})