You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Regression analysis


General

The regression analysis option in hymosincludes:

  • computation of correlation matrix, and
  • fitting of following type of functions: polynomial, simple linear, exponential, power, logarithmic, hyperbolic, multiple linear and stepwise.
    Relation curves^image010.gif!
    The multiple linear functions can be fitted by means of multiple or step-wise regression techniques. The algebraic forms of the functions are presented in next section.

    Series codes

    Data of series with the same time interval, read from the database, can enter the regression analysis. The series can be selected by selecting a series in the series list box and pressing the '>>' button. To deselect one of the series, selected the series in the spreadsheet and press '<<'. For regression analysis between series a minimum of two series must be selected. When only one series is selected, a regression against time will be calculated. When selecting a series, the minimum and maximum values of the series for the defined time period will be calculated. Each series must be given a code.

    Coding of variables

    0: free variable available for regression
    1: forced variable, which will enter regression
    2: variable not entering regression
    3: dependent variable (only one per selection )

    Minimum value / Maximum value

    The lower and upper boundaries for each series must be set; data outside these boundaries are eliminated; hymosshows default the minimum and maximum values.

    Regression function

    Choose a regression option from the list box by clicking the function.If you select <Polynomial> the degree of the polynomial has to be entered as well.
    If you select <Stepwise> following data have to be entered:
  • maximum number of steps in the stepwise regression analysis;
  • whether or not the results of each step in the stepwise analysis have to be printed; if you enter <N>, only the results of the last step will be presented.
  • F-values to enter and delete variables from the regression analysis, which only applies for the free variables .

    Save Relation

    Save the calculated relation for the time period defined in the Relation Validity Period. The start and end-time of this period can be changed by double-clicking the time-label. The coefficients of the regression equation will be stored in the data base. There are, however, some limitations to the number of coefficients that can be stored:
  • in case of polynomial regression, the degree of the polynomial should be £4;
  • in case of multiple/stepwise regression the number of independent variables should be £4.

    Regression equations

    The following types of regression equations are available, with Y the dependent variable and X~j~ 's the independent variables:

    Type

    Equation

    Polynomial

    Relation curves^image012.gif!
    with:
    n
    =
    degree of polynomial: n £4,
    C~j~
    =
    coefficient

    Simple linear

    Y~i~ = A + B*.X~i~
    with:
    A,B = coefficients

    Exponential 1

    Y~i~ = A exp(B*X~i~ )
    with:
    A,B = coefficients

    Exponential 2

    Y~i~ = A exp(B/X~i~ )
    with:
    A,B = coefficients

    Power

    Y~i~ = A.X~i~ B
    with:
    A,B = coefficients

    Logarithmic

    Y~i~ = A + B.ln(X~i~ )
    with:
    A,B = coefficients

    Hyperbolic

    Y~i~ = A + B/X~i~
    with:
    A,B = coefficients

    Multiple linear

    Relation curves^image014.gif!
    with:
    n
    =
    number of series on independent variables, n£4
    C~j~
    = coefficient


    Computation procedure

    All coefficients are determined by the least squares method.
    The Multiple Linear equation may be established by multiple or by stepwise regression:
  • In the multiple regression situation all independent variables enter the regression equation at one go.
  • In the stepwise regression situation the independent variables enter the regression equation one by one. The order of entry may be:
  • free: the variables to which this option applies are called free variables ; their entry is determined by statistical properties; the variable added is the one which makes the greatest improvement in goodness of fit. Among the free varia­bles only the significant variables are included in the final regression equation.
  • predetermined: the variables to which this option applies are called forced variables ; i.e. their entry is not based on correlation but solely requested by the user.

    Confidence limits

    For the linear regression methods, confidence intervals can be computed. Based on the sampling distributions of the regression parameters, the following estimates and confidence limits hold (see e.g. Kottegoda and Rosso, 1998).
    If the linear regression function is as follows:
    Relation curves^image016.gif!
    where:
    Y = dependent variable, also called response variable,
    X = independent variable or explanatory variable,
    a,b= regression coefficients,
    e= residual because of imperfect match of regression function through measurement points.
    An unbiased estimate of the error variance is given by:
    Relation curves^image018.gif!
    where:
    Relation curves^image020.gif!
    The parameter n is the total number of values for making the regression function and_x_ m ,y m are the mean values of the two series. Note that n-2 appears in the denominator to reflect the fact that two degrees of freedom have been lost in estimating (a, b)

    Relation curves^image022.gif!


    A (100-a) percent confidence interval for the mean response to some input value x~i~ of X is given by:
    Relation curves^image024.gif!
    where:
    t~n-2,1-~ a /2 = Student's t distribution with n-2 degrees of freedom.
    Note that the farther away x~i~ is from its mean the wider the confidence interval will be because the last term under the root sign expands in that way.
    HYMOS uses by default a 95% confidence interval of the regression line.
    Example
    In the underneath table some 17 years of annual rainfall (2) and runoff (3) data of a basin are presented. Regression analysis will be applied to validate the runoff series. No changes took place in the drainage characteristics of the basin.
    From the HYMOSfunctions list select the Regression function. Add the two series to the spreadsheet and give the rainfall series code 1, and the discharge series code 3. Select the linear regression function and press <Execute>.
    Table: Example computation of confidence limits for regression analysis

    Year

    X=Rainfall

    Y=Runoff

    (X-X~m~ )2

    (Y-Y~m~ )2

    (X-X~m~ )(Y-Y~m~ )

    Yest

    UC1

    LC1

    1

    2

    3

    4

    5

    6

    7

    8

    9

    1961

    1130

    740

    44100

    36100

    39900

    737

    801

    673

    1962

    1280

    1040

    3600

    12100

    -6600

    875

    922

    827

    1963

    1270

    960

    4900

    900

    -2100

    866

    914

    818

    1964

    1040

    610

    90000

    102400

    96000

    654

    732

    576

    1965

    1080

    590

    67600

    115600

    88400

    691

    762

    619

    1966

    1150

    820

    36100

    12100

    20900

    755

    816

    694

    1967

    1670

    1090

    108900

    25600

    52800

    1234

    1317

    1150

    1968

    1540

    1020

    40000

    8100

    18000

    1114

    1176

    1052

    1969

    990

    570

    122500

    129600

    126000

    608

    695

    521

    1970

    1190

    780

    22500

    22500

    22500

    792

    848

    736

    1971

    1520

    1090

    32400

    25600

    28800

    1096

    1155

    1036

    1972

    1370

    960

    900

    900

    900

    958

    1004

    911

    1973

    1650

    1240

    96100

    96100

    96100

    1215

    1295

    1135

    1974

    1510

    1030

    28900

    10000

    17000

    1086

    1145

    1028

    1975

    1600

    1340

    67600

    168100

    106600

    1169

    1241

    1098

    1976

    1300

    870

    1600

    3600

    2400

    893

    940

    847

    1977

    1490

    1060

    22500

    16900

    19500

    1068

    1124

    1012

    x~m~ ,S~xx~

    1340

    930

    790200

    786200

    727100





    From this table the values for S~xx~ , S~yy~ and S~xy~ can be computed (last value of columns 4,5 and 6). From these values the error variance and confidence limits (8+9) can be computed together with the Student's t value. Note that t~n-2,1-~ a /2 = 2.131 and s~e~ = 88.3 mm. HYMOS will show the following graph.
  • No labels