Regression analysis
General
The regression analysis option in hymosincludes:
- computation of correlation matrix, and
- fitting of following type of functions: polynomial, simple linear, exponential, power, logarithmic, hyperbolic, multiple linear and stepwise.
The multiple linear functions can be fitted by means of multiple or step-wise regression techniques. The algebraic forms of the functions are presented in next section.
Series codes
Data of series with the same time interval, read from the database, can enter the regression analysis. The series can be selected by selecting a series in the series list box and pressing the '>>' button. To deselect one of the series, selected the series in the spreadsheet and press '<<'. For regression analysis between series a minimum of two series must be selected. When only one series is selected, a regression against time will be calculated. When selecting a series, the minimum and maximum values of the series for the defined time period will be calculated. Each series must be given a code.
Coding of variables
0: free variable available for regression
1: forced variable, which will enter regression
2: variable not entering regression
3: dependent variable (only one per selection )
Minimum value / Maximum value
The lower and upper boundaries for each series must be set; data outside these boundaries are eliminated; hymosshows default the minimum and maximum values.
Regression function
Choose a regression option from the list box by clicking the function.If you select <Polynomial> the degree of the polynomial has to be entered as well.
If you select <Stepwise> following data have to be entered:
- maximum number of steps in the stepwise regression analysis;
- whether or not the results of each step in the stepwise analysis have to be printed; if you enter <N>, only the results of the last step will be presented.
- F-values to enter and delete variables from the regression analysis, which only applies for the free variables .
Save Relation
Save the calculated relation for the time period defined in the Relation Validity Period. The start and end-time of this period can be changed by double-clicking the time-label. The coefficients of the regression equation will be stored in the data base. There are, however, some limitations to the number of coefficients that can be stored: - in case of polynomial regression, the degree of the polynomial should be £4;
- in case of multiple/stepwise regression the number of independent variables should be £4.
Regression equations
The following types of regression equations are available, with Y the dependent variable and Xj 's the independent variables:
Type
Equation
Polynomial
Relation curves^image012.gif!
with:
n
=
degree of polynomial: n £4,
Cj
=
coefficientSimple linear
Yi = A + B*.Xi
with:
A,B = coefficientsExponential 1
Yi = A exp(B*Xi )
with:
A,B = coefficientsExponential 2
Yi = A exp(B/Xi )
with:
A,B = coefficientsPower
Yi = A.Xi B
with:
A,B = coefficientsLogarithmic
Yi = A + B.ln(Xi )
with:
A,B = coefficientsHyperbolic
Yi = A + B/Xi
with:
A,B = coefficientsMultiple linear
Relation curves^image014.gif!
with:
n
=
number of series on independent variables, n£4
Cj
= coefficient
Computation procedure
All coefficients are determined by the least squares method.
The Multiple Linear equation may be established by multiple or by stepwise regression: - In the multiple regression situation all independent variables enter the regression equation at one go.
- In the stepwise regression situation the independent variables enter the regression equation one by one. The order of entry may be:
- free: the variables to which this option applies are called free variables ; their entry is determined by statistical properties; the variable added is the one which makes the greatest improvement in goodness of fit. Among the free variables only the significant variables are included in the final regression equation.
- predetermined: the variables to which this option applies are called forced variables ; i.e. their entry is not based on correlation but solely requested by the user.
Confidence limits
For the linear regression methods, confidence intervals can be computed. Based on the sampling distributions of the regression parameters, the following estimates and confidence limits hold (see e.g. Kottegoda and Rosso, 1998).
If the linear regression function is as follows:
where:
Y = dependent variable, also called response variable,
X = independent variable or explanatory variable,
a,b= regression coefficients,
e= residual because of imperfect match of regression function through measurement points.
An unbiased estimate of the error variance is given by:
where:
The parameter n is the total number of values for making the regression function and_x_ m ,y m are the mean values of the two series. Note that n-2 appears in the denominator to reflect the fact that two degrees of freedom have been lost in estimating (a, b)
A (100-a) percent confidence interval for the mean response to some input value xi of X is given by:
where:
tn-2,1- a /2 = Student's t distribution with n-2 degrees of freedom.
Note that the farther away xi is from its mean the wider the confidence interval will be because the last term under the root sign expands in that way.
HYMOS uses by default a 95% confidence interval of the regression line.
Example
In the underneath table some 17 years of annual rainfall (2) and runoff (3) data of a basin are presented. Regression analysis will be applied to validate the runoff series. No changes took place in the drainage characteristics of the basin.
From the HYMOSfunctions list select the Regression function. Add the two series to the spreadsheet and give the rainfall series code 1, and the discharge series code 3. Select the linear regression function and press <Execute>.
Table: Example computation of confidence limits for regression analysis
Year |
X=Rainfall |
Y=Runoff |
(X-Xm )2 |
(Y-Ym )2 |
(X-Xm )(Y-Ym ) |
Yest |
UC1 |
LC1 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
1961 |
1130 |
740 |
44100 |
36100 |
39900 |
737 |
801 |
673 |
1962 |
1280 |
1040 |
3600 |
12100 |
-6600 |
875 |
922 |
827 |
1963 |
1270 |
960 |
4900 |
900 |
-2100 |
866 |
914 |
818 |
1964 |
1040 |
610 |
90000 |
102400 |
96000 |
654 |
732 |
576 |
1965 |
1080 |
590 |
67600 |
115600 |
88400 |
691 |
762 |
619 |
1966 |
1150 |
820 |
36100 |
12100 |
20900 |
755 |
816 |
694 |
1967 |
1670 |
1090 |
108900 |
25600 |
52800 |
1234 |
1317 |
1150 |
1968 |
1540 |
1020 |
40000 |
8100 |
18000 |
1114 |
1176 |
1052 |
1969 |
990 |
570 |
122500 |
129600 |
126000 |
608 |
695 |
521 |
1970 |
1190 |
780 |
22500 |
22500 |
22500 |
792 |
848 |
736 |
1971 |
1520 |
1090 |
32400 |
25600 |
28800 |
1096 |
1155 |
1036 |
1972 |
1370 |
960 |
900 |
900 |
900 |
958 |
1004 |
911 |
1973 |
1650 |
1240 |
96100 |
96100 |
96100 |
1215 |
1295 |
1135 |
1974 |
1510 |
1030 |
28900 |
10000 |
17000 |
1086 |
1145 |
1028 |
1975 |
1600 |
1340 |
67600 |
168100 |
106600 |
1169 |
1241 |
1098 |
1976 |
1300 |
870 |
1600 |
3600 |
2400 |
893 |
940 |
847 |
1977 |
1490 |
1060 |
22500 |
16900 |
19500 |
1068 |
1124 |
1012 |
xm ,Sxx |
1340 |
930 |
790200 |
786200 |
727100 |
|
|
|
From this table the values for Sxx , Syy and Sxy can be computed (last value of columns 4,5 and 6). From these values the error variance and confidence limits (8+9) can be computed together with the Student's t value. Note that tn-2,1- a /2 = 2.131 and se = 88.3 mm. HYMOS will show the following graph.