Regression analysis


General

The regression analysis option in hymosincludes:

  • computation of correlation matrix, and
  • fitting of following type of functions: polynomial, simple linear, exponential, power, logarithmic, hyperbolic, multiple linear and stepwise.

The multiple linear functions can be fitted by means of multiple or step-wise regression techniques. The algebraic forms of the functions are presented in next section.

Series codes

Data of series with the same time interval, read from the database, can enter the regression analysis. The series can be selected by selecting a series in the series list box and pressing the '>>' button. To deselect one of the series, selected the series in the spreadsheet and press '<<'. For regression analysis between series a minimum of two series must be selected. When only one series is selected, a regression against time will be calculated. When selecting a series, the minimum and maximum values of the series for the defined time period will be calculated. Each series must be given a code.

Coding of variables

0: free variable available for regression
1: forced variable, which will enter regression
2: variable not entering regression
3: dependent variable (only one per selection )

Minimum value / Maximum value

The lower and upper boundaries for each series must be set; data outside these boundaries are eliminated; hymosshows default the minimum and maximum values.

Regression function

Choose a regression option from the list box by clicking the function.If you select <Polynomial> the degree of the polynomial has to be entered as well.
If you select <Stepwise> following data have to be entered:

  • maximum number of steps in the stepwise regression analysis;
  • whether or not the results of each step in the stepwise analysis have to be printed; if you enter <N>, only the results of the last step will be presented.
  • F-values to enter and delete variables from the regression analysis, which only applies for the free variables .

    Save Relation

    Save the calculated relation for the time period defined in the Relation Validity Period. The start and end-time of this period can be changed by double-clicking the time-label. The coefficients of the regression equation will be stored in the data base. There are, however, some limitations to the number of coefficients that can be stored:
  • in case of polynomial regression, the degree of the polynomial should be £4;
  • in case of multiple/stepwise regression the number of independent variables should be £4.

    Regression equations

    The following types of regression equations are available, with Y the dependent variable and Xj 's the independent variables:

    Type

    Equation

    Polynomial

    Relation curves^image012.gif!
    with:
    n
    =
    degree of polynomial: n £4,
    Cj
    =
    coefficient

    Simple linear

    Yi = A + B*.Xi
    with:
    A,B = coefficients

    Exponential 1

    Yi = A exp(B*Xi )
    with:
    A,B = coefficients

    Exponential 2

    Yi = A exp(B/Xi )
    with:
    A,B = coefficients

    Power

    Yi = A.Xi B
    with:
    A,B = coefficients

    Logarithmic

    Yi = A + B.ln(Xi )
    with:
    A,B = coefficients

    Hyperbolic

    Yi = A + B/Xi
    with:
    A,B = coefficients

    Multiple linear

    Relation curves^image014.gif!
    with:
    n
    =
    number of series on independent variables, n£4
    Cj
    = coefficient


    Computation procedure

    All coefficients are determined by the least squares method.
    The Multiple Linear equation may be established by multiple or by stepwise regression:
  • In the multiple regression situation all independent variables enter the regression equation at one go.
  • In the stepwise regression situation the independent variables enter the regression equation one by one. The order of entry may be:
  • free: the variables to which this option applies are called free variables ; their entry is determined by statistical properties; the variable added is the one which makes the greatest improvement in goodness of fit. Among the free varia­bles only the significant variables are included in the final regression equation.
  • predetermined: the variables to which this option applies are called forced variables ; i.e. their entry is not based on correlation but solely requested by the user.

    Confidence limits

    For the linear regression methods, confidence intervals can be computed. Based on the sampling distributions of the regression parameters, the following estimates and confidence limits hold (see e.g. Kottegoda and Rosso, 1998).
    If the linear regression function is as follows:

    where:
    Y = dependent variable, also called response variable,
    X = independent variable or explanatory variable,
    a,b= regression coefficients,
    e= residual because of imperfect match of regression function through measurement points.
    An unbiased estimate of the error variance is given by:

where:

The parameter n is the total number of values for making the regression function and_x_ m ,y m are the mean values of the two series. Note that n-2 appears in the denominator to reflect the fact that two degrees of freedom have been lost in estimating (a, b)

A (100-a) percent confidence interval for the mean response to some input value xi of X is given by:


where:
tn-2,1- a /2 = Student's t distribution with n-2 degrees of freedom.

Note that the farther away xi is from its mean the wider the confidence interval will be because the last term under the root sign expands in that way.

HYMOS uses by default a 95% confidence interval of the regression line.

Example
In the underneath table some 17 years of annual rainfall (2) and runoff (3) data of a basin are presented. Regression analysis will be applied to validate the runoff series. No changes took place in the drainage characteristics of the basin.
From the HYMOSfunctions list select the Regression function. Add the two series to the spreadsheet and give the rainfall series code 1, and the discharge series code 3. Select the linear regression function and press <Execute>.
Table: Example computation of confidence limits for regression analysis

Year

X=Rainfall

Y=Runoff

(X-Xm )2

(Y-Ym )2

(X-Xm )(Y-Ym )

Yest

UC1

LC1

1

2

3

4

5

6

7

8

9

1961

1130

740

44100

36100

39900

737

801

673

1962

1280

1040

3600

12100

-6600

875

922

827

1963

1270

960

4900

900

-2100

866

914

818

1964

1040

610

90000

102400

96000

654

732

576

1965

1080

590

67600

115600

88400

691

762

619

1966

1150

820

36100

12100

20900

755

816

694

1967

1670

1090

108900

25600

52800

1234

1317

1150

1968

1540

1020

40000

8100

18000

1114

1176

1052

1969

990

570

122500

129600

126000

608

695

521

1970

1190

780

22500

22500

22500

792

848

736

1971

1520

1090

32400

25600

28800

1096

1155

1036

1972

1370

960

900

900

900

958

1004

911

1973

1650

1240

96100

96100

96100

1215

1295

1135

1974

1510

1030

28900

10000

17000

1086

1145

1028

1975

1600

1340

67600

168100

106600

1169

1241

1098

1976

1300

870

1600

3600

2400

893

940

847

1977

1490

1060

22500

16900

19500

1068

1124

1012

xm ,Sxx

1340

930

790200

786200

727100





From this table the values for Sxx , Syy and Sxy can be computed (last value of columns 4,5 and 6). From these values the error variance and confidence limits (8+9) can be computed together with the Student's t value. Note that tn-2,1- a /2 = 2.131 and se = 88.3 mm. HYMOS will show the following graph.

  • No labels