Statistical functions

Box plot

The box plot function will calculate a set of statistical values for each of the selected timeseries and displays these values in the chart as a Box and Whisker Plot.

In the table the following list of statical values is shown for each of the timeseries:

Statistical variables
Minimum outlier: minimum outlier value for selection.
Minimum regular value: minimum value for selection that is not defined as an outlier. Also known as the Whiskers
25%: 25th percentile
Median: 50th percentile
Mean: average value for selection
75%: 75th percentileMaximum regular value: maximum value for selection that is not defined as an outlier. Also known as the WhiskersMaximum regular value: maximum value for selection that is not defined as an outlier. Also known as the Whiskers
Maximum outlier: maximum outlier value for selection.

The chart shows the same values as the table, hover the chart also can include some extra values that are not shown in the table. These are the outliers and the far-out indicators.

Far-out indicator: indicates that there are values that lie outside the plotted range of the axis.
Single outlier: a single outlier value.
Clustered outlier: multiple outliers that are located too close together to be plotted separately.

Outliers: cases where the values are between 1.5 and 3 box-lengths from the 75th percentile or 25th percentile.
Farout values: cases where the values are more than 3 box-lengths from the 75th percentile or 25th percentile.

Calendar aggregation

Aggregation by calendar day (00.00h-24.00h)

Moving average

Moving average where the value is stamped at the end of each averaging period.

Central moving average

Moving average where the value is stamped in the middle of each averaging period.

Display lows below value

A scatterplot is made where the x-axis shows the duration of a 'low' (=values within this low-area are all beneath the given reference level), the y-axis shows the normalized difference between the parameter value and the reference level. The reference level can be altered by entering a value into the input field associated with this statistical function. After clicking 'Apply' the result time series array is returned.
If no reference level is entered, then the 'low' areas are determined according to the maximum available value of the input time series array.

Display peaks above value

A scatterplot is made where the x-axis shows the duration of a 'peak' (=values within this peak-area are all above the given reference level), the y-axis shows the normalized difference between the parameter value and the reference level. The reference level can be altered by entering a value into the input field associated with this statistical function. After clicking 'Apply' the result time series array is returned.
If no reference level is entered, then the 'peak' areas are determined according to the minimum available value of the input time series array.

Duration curve

A convenient way to show the variation of hydrological quantities through time may be done by means of duration curves. For the selected time period the values of the selected quantity are sorted descending (durationExceedence) or ascending (durationNonExceedence). When the duration curve is plotted in the timeseries display, the x-axis will show the entire length in time of the selected view period. Percentages are shown as duration with respect to the entire chosen view period.

In the configuration of this statistical function there is the option to ommit missing values which may occur in the selected view period. If this option is set to true, all entries with missing values will be disregarded before the duration curve is calculated. If this option is not defined (default) or is set to false, missing values will be added to the the end of the array. In this case the plotted duration curve will never reach the 100%.

An example of the durationExceedence plot is given here:

An example of the durationNonExceedence plot is given here:

When selecting more than one location it could occur that the view periods of these selected timeseries do not cover the same period in time. In this case it is difficult to make a correct comparison of the calculated duration curves because they are analysed on different periods in time. A warning message will be given in order to ensure that the user is aware of this. The pop-up message will be shown each time the user zooms in or out until all view periods are an exact match.

Frequency distribution

The frequency distribution function divides the distance between the min and max value of the timeseries by the number of samples to create a classification. It then evaluate each value in the timeseries and assigns it to a class. The result is a frequency distribution diagram listing the number of occurences per class. Number of classes (samples) can be selected in dropdown box.

Cumulative

Continuous accumulation over entire timeseries.

Accumulation Per Interval

Accumulation per Interval, starting at zero at the beginning of the next interval. Interval can be selected in dropdown box.

Historical Analysis

Allows comparison of the current situation with selected previous years at the same moment within the year. CAn be used to compare seasonal behaviour, e.g. deficit accumulation, snow accummulation/melt, runoff.
The bottom of the window shows the full timeseries to assist in picking relevant historical years.

The function requires a multi-year historical series, where the view period on the x-axis streches over multiple years before the function is selected. You can use the |<>| button to stretch the x-axis from the current view period to the full available length. The display requires a fixed season definition for the x-axis, to be included in the configuration. The user needs to select the historical year of interest to plot this against the current year. Multiple years can be selected by holding the CTRL-key. Holding the SHIFT-key will select a range of years.

Ensemble Percentile Exceedence

Plots for the selected time stamp(s) each member of an ensemble along the horzontal axis, sorted by value. The result is exceedence diagrams for an ensemble forecast at a selected timestamp(s) within the timeseries.
The function allows multiple locations and/or multiple time stamps and/or multiple forecasts in one diagram. If multiple forecats include the same selected timestamp, the legend only can distinguish between these forecasts if a taskdescription is used during job submission.

Principal Component Analysis

The Principal Component Analysis function uses independent historical data (observations) and dependent data (e.g. a simulated basin value) to compute a number of regression equations using the Principal Component Analysis technique. The resulting equation is applied with current observations to estimate the current basin value.

This functionality starts with the timeseries data available in the display. The independent and dependent parameters should correspond to the specification in the TimeSeriesDisplayConfig for the simulated (dependent) and observed (indenpedent) parameter.

Typically, this functionality is intended to work in combination with the Topology and the Filters, such that you can have a default set of locations which you can modify from the map or via the Filters. You can also exclude locations from the analysis by making the timeseries invisible in the graph. By default the function shows the scatterplot of the best 5 equations (lowest root mean square error). By selecting one item in the legend, this item and its confidence interval is shown.

The PCA-estimate for the basin value can be utilized in a modifier if the ModifierTypes-configuration refers to the statistical function for the default value.

Page tree

Statistical functions

Statistical functions

Box plot

Calendar aggregation

Moving average

Central moving average

Display lows below value

Display peaks above value

Duration curve

Frequency distribution

Cumulative

Accumulation Per Interval

Historical Analysis

Ensemble Percentile Exceedence

Principal Component Analysis