Time series analysis

General

Time series analysis includes the execution of following types of analysis:

  1. correlation analysis
  2. spectral analysis
  3. range analysis
  4. run analysis
  5. storage analysis
  6. Confidence Interval of Mean

The analyses are described in the next Sections.

important:
Note that the time series which are investigated should not contain missing values!!

Correlation analysis

Correlation analysis covers the computation of:

  1. auto-covariance function
  2. auto-correlation function
  3. cross-covariance function
  4. cross-correlation function

HYMOS produces tabular and graphical presentations of these functions.

Auto-covariance and auto-correlation functions

For the series xi, i = 1, N the auto-covariance function cxx(k), k = 0, Lmax is computed as follows:


where:
mx = average of xi, i = 1, N.
k = time-lag in time units equal to the time interval
Lmax = maximum lag

The auto-correlation function rxx(k) is determined from:

rxx (k) = cxx (k)/cxx (0)

The 95% tolerance or confidence limits for zero correlation according to Siddiqui (see Yevjevich, 1972) are computed from: 



where:
CLp(k) = upper confidence limit for zero correlation at lag k
CLn(k) = lower confidence limit for zero correlation at lag k

notes:

  1. The largest time lag in the correlogram should be less than N/2; in hymosits maximum is limited to 100.
  2. In the estimator for c~xx~ (k), the divisor is N rather then N-k, since the former have smaller mean square error; for the interpretation of the correlogram one has to keep in mind that this approach has a decaying effect on the estimate at increasing time lag.
  3. The tolerance or confidence limits can be selected by the user by entering a percentage.

 Example

An example of an auto-correlogram of monthly rainfall, shown in the underneath Figure, is presented in the next table and figure.

Figure: Time series of monthly rainfall

Time series analysis
====================
Autocovariance and autocorrelation analysis
============================================
Series = 170203         PH
  Date of first element                                    = 1967  1 0 0 1
  Date of last element                                    = 1986 12 0 0 1

     COV      = autocovariance function
     COR      = autocorrelation function
     CLP       = upper conf. limit zero correlation (95 %)
     CLN       = lower conf. limit zero correlation (95 %)

              LAG              COV        COR        CLP       CLN
                  0     .2086E+05     1.0000     .1213     -.1296
                  1     .1251E+05       .5995     .1216     -.1299
                  2     .6325E+04       .3032     .1218     -.1302
                  3    -.5033E+03     -.0241     .1220     -.1304
                  4    -.6600E+04     -.3163     .1223     -.1307
                  5    -.1204E+05     -.5771     .1225     -.1310
.. ......... ..... ..... ......

Table: Example of output: auto-correlogram of monthly rainfall

Figure: Auto-correlogram of monthly rainfall

Cross-covariance and cross-correlation functions

The cross-covariance functions cxy(k) and cyx(k), k = 0, Lmax are computed as follows:



where:
mx = average of xi, i = 1, N
my = average of yi, i = 1, N.

The cross-correlation functions rxy(k) and ryx(k) are estimated from:

r xy (k) = c xy (k)/(s x .s y )
r yx (k) = c yx (k)/(s x .s y )

where:
sx = standard deviation of xi, i = 1, N
sy = standard deviation of yi, i = 1, N.

notes:

  1. The largest time lag in the correlogram should be less than N/2; in hymosits maximum is limited to 100.
  2. In the estimators for cxy(k and cyx(k) the divisor is N rather then N-k, since the former have smaller mean square error; for the interpretation of the correlogram one has to keep in mind that this approach has a decaying effect on the estimate at increasing time lag.

Spectral analysis

The smoothed auto-spectral estimate C xx (f), for f=0,..,1/2 is calculated from:


where:
f = frequency in cycles per time interval, computed at spacings 1/(2Nf ), where Nf is 2 to 3 times M
Nf = number of frequency points
cxx (k) = autocovariance function at lag k
M = truncation point or maximum lag of the autocovariance function used to estimate the autospectrum; clearly M is conditioned by: M ≤ Lmax(see Section XII.2.2)
w(k) = window function

Following window w(k) for k = 1, M-1 according to Tukey is used to smooth the spectral estimate:


The bandwith B and number of degrees of freedom Ndf are given by:

B = 4/(3M)
Ndf = 8N/(3M)

The logarithm of the auto-spectrum is computed by:

C log(f) = log10 Cxx (f)

In the results Clog(f) will be set to -100 if Cxx(f) ≤ 0.

The spectral density function follows from:


The tolerance or confidence limits for white noise are computed from:


where:
CLp = upper confidence limit for white noise
CLn = lower confidence limit for white noise
a = tolerance or confidence interval, default 95%

notes:

  1. Truncation point M: An important aspect of the estimation of the spectrum is the choice of the bandwidth B = 1.33/M, which implies the choice of the truncation point of the covariance function used to compute the spectrum. Suppose it is required to detect details of width w in the spectrum. Then the truncation point M should be chosen so that the bandwidth B is less than w, i.e. B < w, or M > 1.33/w
  2. Frequency points N f : It has been suggested that C xx (f) and R xx (f) should only be computed at values of f corresponding to f = 0,1/N f ,2/N f ,..,1/2. However, Jenkins and Watts (1968) indicate that this spacing is too wide, and recommend that C xx (f) and R xx (f) be evaluated at some fraction of this spacing so that a more detailed plot is obtained: N f = 2 to 3 times M.

Example

An example is presented in the underneath figure, where the spectral density function of monthly rainfall given is shown.

Figure Example of output: Spectral density function of monthly rainfall

Range analysis

In the underneath figure a definition sketch of the following range related quantities is given:

  • adjusted surplus aSN+
  • adjusted deficit aSN-
  • adjusted range aRN,
  • rescaled adjusted range aRN*.


Figure Definition sketch range quantities

The quantities are computed from the accumulative departures from the mean Si for i = 0, N and with S0 = 0:

where:
mx = average of xi, i = 1, N
cf = conversion factor (time units per time interval) to transfer intensities into volumes

It follows for:

• Surplus aSN+:

• Deficit aSN-:

• Adjusted range aRN:

• Rescaled adjusted range aRN*:

aSN+ = max (S0, S1,..., SN)

aSN- = min (S0, S1,..., SN)

aRN = aSN+ - aSN- 

aRN = RN /(sx.cf)


where:

sx = standard deviation of xi, i = 1, N

Run analysis

A definition sketch for run analysis is presented in the underneath figure.

Up- and downcrossing and runs

Let xc be a crossing level then an upcrossing is defined by:

xi+1³ xc and xi < xc

and a downcrossing by:

xi+1 < xc and xi³ xc

A run is and excursion above or below the level xc, i.e. bounded by an upcrossing and a downcrossing or a downcrossing and an upcrossing. Note: hymosalso interprets as runs the first and last excursion above or below level xc, which are only bounded by an upcrossing or a downcrossing; these runs are incomplete.


Figure Definition sketch for run analysis

Runlength

With respect to runlength, the following distinction has to be made:

  • positive runlength RL^+^
  • negative runlength RL^-^,
  • total runlength, i.e. successive pair of RL^+^ + RL^-^

RL^+^ = the time span between an upcrossing and a downcrossing, given as a number of time intervals and
RL^-^ = the time span between a downcrossing and an upcrossing given as a number of time intervals.

Runsum

The positive and negative runsums RS^+^ and RS^-^, respectively, are computed from:


where:
j = location of an upcrossing
k = location of the next downcrossing
cf = conversion factor (= time units per time interval) to transfer intensities into volumes

 

where:
k = location of the downcrossing
m = location of the next upcrossing

Storage Analysis

Water shortage or equivalently storage requirements without running dry are computed for various draft levels from the reservoir. The procedure is a computerised variant of the well known graphical Rippl technique. The algorithm considers the following sequence of storages:

Si = Si-1 + (xi - Dx )cf i = 1, 2N ; S0 = 0

where:
xi = inflow
Dx = DL .m x
mx = average of xi, i = 1, N
DL = draft level as a fraction of m  x
cf = multiplier to convert intensities into volumes (time units per time interval)

The local maximum of Si larger than the preceding maximum is sought. Let the locations be k2 and k1 respectively with k2 > k1. Then the largest non-negative difference between Sk1 and Si, i = k1..., k2,... is determined, which is the local range. This procedure is executed for two times the actual series xi; hence the series xi~ is used twice in sequence: xi = xN+i. In this way initial effects are eliminated.

References

  • Yevjevich, V.: Stochastic processes in hydrology, Water Resources Publications, Fort Collins, 1972
  • Jenkins, G.M. and D.G. Watts: Spectral analysis and its applications, Holden Day, San Day, San Francisco, 1968

Confidence Interval of Mean

After calculating the mean it is useful to estimate the confidence range of the mean. The bounds of this confidence interval are the confidence limits of the mean. The width of the interval increases when:

  • more confidence is requested, e.g. 99% (= 0.01) instead of 95% (= 0.05);
  • the standard deviation is higher
  • the sample size is smaller
  • serial correlation is present

Calculation



where:
ucl = upper confidence limit of average for 1-z confidence level
lcl = lower confidence limit of average for 1-z confidence level
x = average of sample (calculated)
   = error level
t, n-1 = critical value of student - t distribution for a confidence level of 1-
s = standard deviation of sample (calculated)
n = square root of number of measurements in sample

Correction for effective measurement

HYMOS offers the option of using the number of effective (n*) instead of the total number of the measurements in the sample (n) . The number of effective measurements decreases if there is a higher serial correlation present in the data series according to:

 

where:
n = number of measurements in sample
n* = number of effective (or independent) measurements in sample
rk = auto-correlation coefficient at lag(k)

Auto-correlation



where
ck = auto-covariance at lag(k)
c0 = variance at lag(k)
rk = auto-correlation coefficient at lag(k)
n = number of measurements in sample
xi,i+k = measurement at time t, t+k
mx = average of the measurements

  • No labels