Basic Statistics

General

The following statistics and distributions are included in this option:

minimum value,
maximum value,
mean value,
median,
mode,
standard deviation,
variance,
skewness,
kurtosis,
empirical frequency distribution, and
empirical cumulative frequency distribution.

These statistics are discussed in this section. The selection procedure of data is dealt with in next section.

Computational Procedure

For data vector X_i ,(i = 1, N), the basic statistics are determined as follows:

minimum: X_min = min(X₁, X₂, X₃, ..., X_N )
maximum: X_max = max(X₁, X₂, X₃, ..., X_N )
arithmetic mean:
median: middle value of the ranked series X_i
mode: value of X which occurs with the greatest frequency; i.e. the middle value of the class with the greatest frequency. If classes have equal greatest frequency then the middle value of the class with the lowest class levels will be indicated as mode.
standard deviation:
!05 - Time Series Analysis^image005.gif!
variance = sx*sx
skewness:
!05 - Time Series Analysis^image007.gif!
kurtosis:
!05 - Time Series Analysis^image009.gif!
quantiles and deciles: the quantiles and deciles are computed with the function:
!05 - Time Series Analysis^image011.gif!
where k = the rank in sorted data array.
If k is not an integer X_k is interpolated between the two closest values
empirical frequency distribution: graphical presentation of number of data per class; number of classes and minimum and maximum class levels are input
empirical cumulative frequency distribution: cumulative representation of frequencies per class. The relative cumulative frequencies are computed by:
!05 - Time Series Analysis^image013.gif!
As can be seen, the relative cumulative frequency is plotted with the Chegodayev function
For the mean and the variance also the 95% confidence intervals will be computed. The Student-t distribution is to be applied and the percentage points t_n, _a _/2 and t_n,1- _a _/2 .are computed, where n = N-1 is the number of degrees of freedom. The confidence limits for the mean then read:
!05 - Time Series Analysis^image015.gif!
Given an estimate of the sample variance the true variance s_Y ² will be contained within the following confidence interval with a probability of 100(1-a) %:
!05 - Time Series Analysis^image017.gif!
The values for c² _n, _a _/2 and c² _n,1- _a _/2 are read from the tables of the Chi-square distribution for given aand n.
The 95% confidence limits for the median, 25% quantile and 75% quantile are computed with:
!05 - Time Series Analysis^image019.gif!

Selection of data

Data for statistical analysis is be read from the hymosdatabase.

Series codes

Series can be selected by clicking the series in the 'series codes' list box. Only one series may be selected at a time.

Select Type

The following type of data can be considered:

actual values
annual minimum values, and
annual maximum values.
In case of actual values a threshold selection menu is displayed from which no threshold, minimum, maximum, both, peaks over threshold or peaks under threshold can be selected.
The annual minimum/maximum values may be the minimum/maximum of full years or of a part of years, like seasonally or monthly extremes.

Computation Period

The computation period can be set to 'Full years' or 'Part of years'. When 'Full years' is selected all data values of the complete year will be taken into the analysis, make sure the start date and end date of the processing period are form the first of January to the first of January. When 'Part of years' is selected a start date and end date for the sub-period must be entered. When data for only the month of March must be selected, the start date of the sub-period must be set to "01-03" and the end date of the sub-period to "01-04".

Classes

The number of classes and the lower and upper class limits for the cumulative frequency distribution and histogram can be entered. When no values are entered for the lower and upper class, HYMOS will compute the lower and upper class levels from the data.

Note

Note that the time series that are investigated should not contain missing values!

POT method in HYMOS

The basic statistics and fitting distribution functions of HYMOS can use the POT method for selecting data. The Basic Statistics method also has an option of selecting data by a threshold. How these data selection options work will be explained here.
In case of actual values a threshold selection menu is displayed from which no threshold, minimum, maximum, both, Peaks over Threshold or Peaks under Threshold can be selected. If peaks over threshold is selected also a value for the 'Horizon' can be entered (a period in time interval units of the original series, e.g. days, which is used to skip lower peaks within that period before or after a peak). Default value = 1 (no horizon). A maximum of 1000 values/peaks of a maximum of 50000 input values will enter analysis.

Data Selection for threshold method

The selection of data for threshold method is very straightforward:

No Threshold: there is no selection on the magnitude of the data,
Upper Threshold: values above the entered threshold will be excluded from computation. The lower threshold will be taken as the lowest value in the selected time series,
Lower Threshold: values below the entered threshold will be excluded from computation. The Upper threshold will be taken as the highest value in the selected time series,
Upper and Lower Threshold: values below the lower threshold and values above the upper threshold will be excluded from computation.

Data Selection for POT and PUT-method

For use of the Peaks Over Threshold (POT) and Peaks Under Threshold (PUT)-methods actual values have to be selected. For these methods a threshold value and a horizon are entered. For the POT option all values below the threshold will be excluded from computation, for the PUT option all values above the threshold will be excluded. The data used for further computation are all peaks between successive up-crossings and
Down-crossings taking into account the given horizon. The default value of the horizon is 1 (no horizon).
!05 - Time Series Analysis^image021.jpg!
In the picture you see four peaks on time steps t1 to t4, a given horizon and a threshold. First the highest peaks between an up-crossing and down-crossing are computed. Because peak at t4 is higher than t3 within the same crossing period, peak t3 is not seen as a real peak. When the horizon is set to 1 the POT method will return three peaks for the selected period, namely t1, t2 and t4. When a value for the horizon is entered larger than 1, HYMOS will skip all the lower peaks within the horizon period. In this example, peak t2 will be skipped and the POT method will return peak t1 and t4.

Child pages

Basic Statistics