Regression analysis
General
The regression analysis option in hymosincludes:
- computation of correlation matrix, and
- fitting of following type of functions: polynomial, simple linear, exponential, power, logarithmic, hyperbolic, multiple linear and stepwise.
Relation curves^image010.gif!
The multiple linear functions can be fitted by means of multiple or step-wise regression techniques. The algebraic forms of the functions are presented in next section.
Series codes
Data of series with the same time interval, read from the database, can enter the regression analysis. The series can be selected by selecting a series in the series list box and pressing the '>>' button. To deselect one of the series, selected the series in the spreadsheet and press '<<'. For regression analysis between series a minimum of two series must be selected. When only one series is selected, a regression against time will be calculated. When selecting a series, the minimum and maximum values of the series for the defined time period will be calculated. Each series must be given a code.
Coding of variables
0: free variable available for regression
1: forced variable, which will enter regression
2: variable not entering regression
3: dependent variable (only one per selection )
Minimum value / Maximum value
The lower and upper boundaries for each series must be set; data outside these boundaries are eliminated; hymosshows default the minimum and maximum values.
Regression function
Choose a regression option from the list box by clicking the function.If you select <Polynomial> the degree of the polynomial has to be entered as well.
If you select <Stepwise> following data have to be entered: - maximum number of steps in the stepwise regression analysis;
- whether or not the results of each step in the stepwise analysis have to be printed; if you enter <N>, only the results of the last step will be presented.
- F-values to enter and delete variables from the regression analysis, which only applies for the free variables .
Save Relation
Save the calculated relation for the time period defined in the Relation Validity Period. The start and end-time of this period can be changed by double-clicking the time-label. The coefficients of the regression equation will be stored in the data base. There are, however, some limitations to the number of coefficients that can be stored: - in case of polynomial regression, the degree of the polynomial should be £4;
- in case of multiple/stepwise regression the number of independent variables should be £4.
Regression equations
The following types of regression equations are available, with Y the dependent variable and X~j~ 's the independent variables:
Type
Equation
Polynomial
Relation curves^image012.gif!
with:
n
=
degree of polynomial: n £4,
C~j~
=
coefficientSimple linear
Y~i~ = A + B*.X~i~
with:
A,B = coefficientsExponential 1
Y~i~ = A exp(B*X~i~ )
with:
A,B = coefficientsExponential 2
Y~i~ = A exp(B/X~i~ )
with:
A,B = coefficientsPower
Y~i~ = A.X~i~ B
with:
A,B = coefficientsLogarithmic
Y~i~ = A + B.ln(X~i~ )
with:
A,B = coefficientsHyperbolic
Y~i~ = A + B/X~i~
with:
A,B = coefficientsMultiple linear
Relation curves^image014.gif!
with:
n
=
number of series on independent variables, n£4
C~j~
= coefficient
Computation procedure
All coefficients are determined by the least squares method.
The Multiple Linear equation may be established by multiple or by stepwise regression: - In the multiple regression situation all independent variables enter the regression equation at one go.
- In the stepwise regression situation the independent variables enter the regression equation one by one. The order of entry may be:
- free: the variables to which this option applies are called free variables ; their entry is determined by statistical properties; the variable added is the one which makes the greatest improvement in goodness of fit. Among the free variables only the significant variables are included in the final regression equation.
- predetermined: the variables to which this option applies are called forced variables ; i.e. their entry is not based on correlation but solely requested by the user.
Confidence limits
For the linear regression methods, confidence intervals can be computed. Based on the sampling distributions of the regression parameters, the following estimates and confidence limits hold (see e.g. Kottegoda and Rosso, 1998).
If the linear regression function is as follows:
Relation curves^image016.gif!
where:
Y = dependent variable, also called response variable,
X = independent variable or explanatory variable,
a,b= regression coefficients,
e= residual because of imperfect match of regression function through measurement points.
An unbiased estimate of the error variance is given by:
Relation curves^image018.gif!
where:
Relation curves^image020.gif!
The parameter n is the total number of values for making the regression function and_x_ m ,y m are the mean values of the two series. Note that n-2 appears in the denominator to reflect the fact that two degrees of freedom have been lost in estimating (a, b)
A (100-a) percent confidence interval for the mean response to some input value x~i~ of X is given by:Relation curves^image022.gif!
Relation curves^image024.gif!
where:
t~n-2,1-~ a /2 = Student's t distribution with n-2 degrees of freedom.
Note that the farther away x~i~ is from its mean the wider the confidence interval will be because the last term under the root sign expands in that way.
HYMOS uses by default a 95% confidence interval of the regression line.
Example
In the underneath table some 17 years of annual rainfall (2) and runoff (3) data of a basin are presented. Regression analysis will be applied to validate the runoff series. No changes took place in the drainage characteristics of the basin.
From the HYMOSfunctions list select the Regression function. Add the two series to the spreadsheet and give the rainfall series code 1, and the discharge series code 3. Select the linear regression function and press <Execute>.
Table: Example computation of confidence limits for regression analysisYear
X=Rainfall
Y=Runoff
(X-X~m~ )2
(Y-Y~m~ )2
(X-X~m~ )(Y-Y~m~ )
Yest
UC1
LC1
1
2
3
4
5
6
7
8
9
1961
1130
740
44100
36100
39900
737
801
673
1962
1280
1040
3600
12100
-6600
875
922
827
1963
1270
960
4900
900
-2100
866
914
818
1964
1040
610
90000
102400
96000
654
732
576
1965
1080
590
67600
115600
88400
691
762
619
1966
1150
820
36100
12100
20900
755
816
694
1967
1670
1090
108900
25600
52800
1234
1317
1150
1968
1540
1020
40000
8100
18000
1114
1176
1052
1969
990
570
122500
129600
126000
608
695
521
1970
1190
780
22500
22500
22500
792
848
736
1971
1520
1090
32400
25600
28800
1096
1155
1036
1972
1370
960
900
900
900
958
1004
911
1973
1650
1240
96100
96100
96100
1215
1295
1135
1974
1510
1030
28900
10000
17000
1086
1145
1028
1975
1600
1340
67600
168100
106600
1169
1241
1098
1976
1300
870
1600
3600
2400
893
940
847
1977
1490
1060
22500
16900
19500
1068
1124
1012
x~m~ ,S~xx~
1340
930
790200
786200
727100
From this table the values for S~xx~ , S~yy~ and S~xy~ can be computed (last value of columns 4,5 and 6). From these values the error variance and confidence limits (8+9) can be computed together with the Student's t value. Note that t~n-2,1-~ a /2 = 2.131 and s~e~ = 88.3 mm. HYMOS will show the following graph.