Spatial correlation_tests_


General

The function described here is applicable to quality and quantity parameters with a spatial character, like rainfall, temperature, evaporation, etc., but sampled at a number of stations (point measurements).
The two correlation functions, the correlation and the semivariance, give an indication of similarities in mechanisms causing a phenomenon to exhibit. Both functions use the co-ordinates of the selected series in their calculation together with the measured values.

The correlation and semivariance functions both make use the covariance. The covariance of two random variables is defined as the expectation of the product between the respective deviations from their mean. The sample estimator C(x1 ,x2 ) for the population covariance is computed from:

Series codes

A series can be selected by pointing with your mouse on the series and pressing your left mouse button, a check mark will be shown in the check box of the series ID. To select, or de-select, all the series in one go, click your right mouse button when your mouse pointer is located on top of the series list box and select <Select All>.
Remember that a minimum of 5 series must be selected.

Correlation

Regional patterns from pair-wise combined series can be studied of different time steps by means of the (lag-zero-) cross-correlation coefficient. The cross-correlation is described as the normalised covariance between two stochastic variables. The normalised covariance (=correlation) of two series is derived from dividing the covariance by the product of the standard deviations of each series. The covariance between two stochastic variables can be calculated with:

where
m = mean of Z(x)
n = number of measurements
The sample estimates S(x 1 ) and S(x 1 ) for the population standard deviations are computed from:

Thus the equation for computing the sample correlation coefficient is:

Note:
The correlation coefficient and covariance may be affected by a few eccentric data pairs. A good alignment of a few extreme pairs can dramatically improve an otherwise poor correlation coefficient. Conversely, an otherwise good correlation could be ruined by the poor alignment of a few extreme pairs.

Cross-correlation functions

Functions that describe the relation between inter-station correlation and inter-station distance, the so-called cross-correlation functions, can be determined empirically from a graph or calculated by least squares fitting. The objective is to find the relationship between inter-station correlation and inter-station distance.
There are many types of functions, i.e. higher degree functions, power functions, exponential functions or Bessel functions. HYMOS uses the power function for calculation of the correlation distance:
r(h) = r(0) * Exp(-h/h0) + a
where:
h = distance between two stations
h0 = correlation distance
a = threshold in function
r(0) = correlation at distance zero
The following information must be entered when using this function:

Number of time steps

This option gives the freedom to use an interval of analysis. If, for example, the series interval is month and summers are to be considered (from April to September). Then in the start date April is indicated and the number of time steps is 6. For each year only the monthly values from April to September will be dealt with in the calculation.

Lower Boundary / Upper Boundary

By entering values for the boundaries, only the values between these values will be taken in the calculation.

Correlation distance function

The values for r(0) and a must be entered, these values are input values for the correlation distance function. The maximum inter-station distance is the maximum distance between stations for which the correlation will be calculated. This distance is also the maximum distance in the graph.

Semivariance

An estimation of the semivariance can be made from the available n measurements Zi with locations X i and X j . Each pair of measurements can be associated with a lag, or distance vector h=X i -X j . The pairs are then grouped in a limited number of lag classes in order to have a significant number of pairs in each class. The mean value of the semivariance of all pairs in a class with a mean distance h m , may be estimated of half the spatial variance var[Z(x)-Z(x+h)] divided by the possible pairs of measurements in that distance class. The traditional non-parametric estimator is:

where:
g(h) = the estimated semivariance for the distance class h ,
Z(xi ),Z(xi +h) = the measured values within a distance class h .
n(h) = the number of pairs in the distance class h .
The resulting plot is called an experimental semivariogram. The semivariogram shows the expected difference in value of two points against its distance.
Any found experimental semivariogram is only a reflection of the true semivariogram. Kitanidis [1993] gave some useful guidelines to obtain a reasonable semivariogram:

  • use three to six intervals,
  • make sure you have at least ten pairs in each interval,
  • include more points at distances where the differences between calculated semivariances is larger. Especially for large values of h there may be significant differences between semivariances calculated from different subsets of data.
    In practice, the experimental semivariogram must be fitted into an idealised model of the semivariogram, a theoretical semivariogram. The idealised curves are defined as simple mathematical functions which relate ? to h .
    The distance at which the maximum semivariance, also called the sill', C , is reached is called the range', a , of the phenomenon. The range characterises the zone of spatial dependency of the data. Almost all experimental semivariograms show an apparent discontinuity at the origin. This intercept, C 0 , is called the nugget variance.
    Appropriate semivariogram models can be based on a linear fit, a spherical fit, an exponential fit, a Gaussian fit, a cubic fit or even a mathematical formula taking anisotropy into account.
    Spherical model :

    Gaussian model :

    Exponential model :

    Power model :

    The next step is crucial, the fitting of a smooth curve through the calculated values in order to express the semivariance by a mathematical formula. By iterative changing of the lag distance, in order to optimise the compilation of the semivariogram expressed in C 0 , C and a , an optimum fit will be achieved. It should be noticed that there must be sufficient lag classes to obtain a reasonable estimate of the semivariogram.
  • No labels