02 Extreme value analysis

Fitting Distributions

General

HYMOS includes the fitting of the commonly used theoretical frequency distributions:

Normal distribution,
Log-normal distribution,
Box-Cox transformation to normality,
Pearson Type III or Gamma distribution,
Raleigh distribution,
Exponential distribution,
General Pearson distribution,
Log-Pearson Type III distribution,
Extreme Type I or Gumbel distribution,
Extreme Type II or Frechet distribution,
Extreme Type III distribution,
Goodrich/Weibull distribution,
Pareto distribution,
Peaks over Threshold (pot)-method for extremes (Pareto distribution)

For each distribution one can obtain:

estimation of parameters,
summary of observed and theoretical probabilities,
goodness of fit-tests:
binomial,
Kolmogorov-Smirnov,
Chi-squares,
computation of extreme values for specific return periods, either related to probability of non-exceedance or exceedance, and
plot of distribution

The plotting of ordered data on extreme probability paper is done according to a general plotting position function: P = (m - a) / (N + 1 - 2a). Constant 'a' is an input variable and is default set to 0.3. Many different plotting functions are used, some of them can be reproduced by changing the constant 'a'.

Gringorton
Weibull
Chegadayev
Blom

P = (m - 0.44) / (N + 0.12)
P = m / (N + 1)
P = (m - 0.3)/(N + 0.4)
P = (m - 0.375) / (N + 0.25)

a = 0.44
a = 0
a = 0.3
a = 0.375

For the normal distribution the Blom plotting function can best be used, for the Gumbel distribution Gringorton gives the best results. A plotting function which can be used for all distribution functions is Chegadayev. More information on plotting positions can be found in many hydrological handbooks (i.e. Applied Hydrology: pages. 394-396).
The frequency distributions are briefly described in then next Section. For a detailed description of the frequency distributions reference is made to other text books. The selection procedure of data is dealt with in Section 10.2.3.

Frequency Distributions

Normal Distribution Function:

with:
_μx = mean of X,
σ_x = standard deviation of X.
The estimates for m_x and σ_x are obtained from the equations explained in the Basic Statistics function.

Log-normal Distribution Function:

with:
Y = ln (X - X₀)
μ_x = mean of X,
s_x = standard deviation of X.
X₀ = location parameter, conditioned by (X₀ < X).

The location parameter X₀ may be determined (3-parameter distribution) or can be provided by the user (2-parameter distribution). The model parameters can be estimated by:

method of moments, or
modified maximum likelihood method.

Box-Cox Transformation to Normality:

Arbitrarily distributed variables may be transformed to normality by the following transformation:

for l ¹0
for l= 0

where:
X₀ = location parameter, (X₀ < X)
λ = transformation parameter,

The location parameter X₀ is provided by the user while the other parameters are estimated by the maximum likelihood method.

Standard Iincomplete Gamma Function:

This function forms the basis of a number of distribution functions. It has the following form:

with:

where:
γ = shape parameter
Z = argument

Pearson Type III or Gamma Distribution Function:

P_PIII (X) = F_G (Z)

with:
Z = (X - X₀)/ β
X₀ = location parameter, (X₀ < X)
β = scale parameter.

The location parameters X₀ may be provided (2-parameter distribution) or estimated (3-parameter distribution). The parameters X₀, b and gcan be estimated by:

1. method of moments, and
2. modified maximum likelihood method

Raleigh Distribution Function:

P_R (X) = F_G (Z)

with:
Z = ((X - X₀) /β )² and γ = 1
X₀ = location parameter, (X₀ < X)
β = scale parameter.

The parameters X₀ and bare estimated from the moments.

Exponential Distribution Function:

P_E (X) = F_G (Z)

with:
Z = (X - X₀) / β and γ = 1
X₀ = location parameter, (X₀ < X)
β = scale parameter.

The parameters X₀ and bare estimated from the moments.

General Pearson Distribution Function:

P_GP (X) = 0.5 + (k/|k|)(F_G (Z) - 0.5)

with:
Z = ((X - X₀ )/β)^k
β = scale parameter.
K = = type parameter (integer):1 : Pearson Type III (Exponential distribution for γ = 1),
2 : Raleigh (γ = 1) and Maxwell (γ = 1.5) distributions,
-1 : Pearson Type V distribution.
The type parameter k is provided by the user. The parameters X₀, (X₀ < X) or g may be provided by the user (2-parameter distribution) or can be estimated (3-parameter distribution). The parameters X₀, β and γ are estimated by a mixed moment-maximum likelihood method.

Log-Pearson Type III Distribution Function:

P_LP (X) = F_G (Z)

with:
Z = (ln(X - X₀ )-Y{~}0 ) / β,
X₀ ,Y₀ = location parameters,
β = scale parameter.
γ = shape parameter.

The location parameter X₀, (X₀ < X) is provided by the user, while the parameters Y₀, band gare estimated by:

mixed moment-maximum likelihood method,
modified maximum likelihood method on Y = ln (X - X₀)

Extreme Type I or Gumbel Distribution:

with:
X₀ = location parameter,
β = scale parameter.

The parameters X₀ and bcan be estimated by:

method of moments,
modified maximum likelihood method, (with or without censoring).

The parameters can also be determined by application of the second method for part of the data set leaving the lowest N₁ and highest N₂ values out of the analysis provided:
N - (N₁ + N₂) ³ 5.

Extreme Type II or Frechet Distribution:

with:
X₀ = location parameter, (X₀ < X) !!
β = scale parameter,
γ = shape parameter. (k < 0)

The location parameter may be given (2-parameter distribution) or estimated (3-parameter distribution). The parameters can be estimated by the modified maximum likelihood method.

Extreme Type III Distribution:

with:
X₀ = location parameter, (X₀ < X) !!
β = scale parameter,
γ = shape parameter. (k > 0)

For the estimation of parameters the same applies as for the Extreme Type II distribution.

Goodrich/Weibull Distribution:

with:
X₀ = location parameter, (X₀ < X) !!
β = scale parameter,
γ = shape parameter. (k > 0)

For the estimation of parameters the same applies as for the Extreme Type II distribution.

Pareto Distribution:

P_PA(X)

= 1 - e^-Z^;
= 1 - (1 - ΘZ)^1/^Θ;
= 1 - (1 - ΘZ)^1/^Θ;

0 < Z < ∞;
0 < Z < ∞;
0 < Z < 1/Θ;

Θ = 0 (GP-I)
Θ < 0 (GP-II)
Θ > 0 (GP-III)

with:
Z = (X - X₀)/σ
X₀ = threshold (to be specified by the user),
σ = scale parameter,
Θ = shape parameter: 0: General Pareto Type-I distribution,
< 0: General Pareto Type-II distribution, and
> 0: General Pareto Type-III distribution.
The domain of X is: for Θ ≤ 0 : X > X₀
for Θ > 0 : X₀ < X < X₀ + σ/Θ
The parameters are estimated either by the maximum likelihood method (Θ ≤ 0 ) or by the method of moments (Θ > 0).

Pareto Distribution with Peaks over Threshold Method

The pot-method uses the Pareto distribution. An additional parameter λ is introduced, which is the average number of exceedances per year. A return period of T years corresponds in the POT method to one occurrence of the extreme in a series of λT exceedances above a fixed threshold X₀ (the location parameter X₀ is equal to the selected lower threshold). Hence the related probability of non-exceedance in a series of all exceedances above a threshold is 1 - 1/(λT).

Selection of Data

Data used for the distribution functions will be read from the hymosdatabase. All distribution functions, except the Pareto with POT method, require the numberof selected data to be equal to the number years for a correct interpretation of probability of non-exceedance versus return period. Only for the Pareto with POT method a correction is applied for the return period. For all other methods a manual correction have to be made when the number of selected data is not equal to the number of years.

Series Codes

Series can be selected by clicking the series in the 'series codes' list box. Only one series may be selected at a time.

Select Type

The following type of data can be considered:

actual values
annual minimum values, and
annual maximum values.

In case of actual values a threshold selection menu is displayed from which no threshold, minimum, maximum, both, peaks over threshold (POT) or peaks under threshold can be selected. If peaks over threshold is selected and no use will be made of the POT-method, a message will appear on screen, the number of values will be made equal to the number of years and the highest values will be selected. If peaks are selected also a value for the 'horizon' can be entered (a period in time interval units of the original series, e.g. days, which is used to skip lower peaks within that period before and after a peak). Default value = 1 (no horizon). A maximum of 1000 values/peaks of a maximum of 50000 input values will enter analysis.
The annual minimum/maximum values may be the minimum/maximum of full years or of a part of years, like seasonally or monthly extremes.

Computation Period

The computation period can be set to 'Full years' or 'Part of years'. When 'Full years' is selected all data values of the complete year will be taken into the analysis, make sure the start date and end date of the processing period are form the first of January to the first of January. When 'Part of years' is selected a start date and end date for the sub-period must be entered. When data for only the month of March must be selected, the start date of the sub-period must be set to "01-03" and the end date of the sub-period to "01-04".

Data selection for pot-method

For use of the Peaks over Threshold (pot)-method <actual values> have to be selected. In the potoption the number of years is input. In an earlier stage (see above) a threshold level X~0~ and a horizon are entered. The data series used in the method are all peaks between successive up-crossings and down-crossings taking into account the given horizon. For a more extensive explanation of the POT method, see the Basic Statistics function.

note : the amount of data is generally not equal to the number years. Hence the interpretation of probability of non-exceedance versus return period differs from the usual one.
References:

Chow, V.T., Maidment, D.R., and Mays, L.W., Applied Hydrology, McGraw-Hill Book Co., 1988.

Child pages