Blog from January, 2010

Use of protected virtual methods - AVOID IT!

See the following discussion on StackOverflow: http://stackoverflow.com/questions/319343/can-you-ever-have-too-many-protected-virtual-methods

Your class includes data members; methods that perform basic internal operations on those data members where the functionality should never change should always be private. So methods that do basic operations with your data members such as initialization and allocation should be private. Otherwise, you run the risk of "second order" derivative classes getting an incomplete set of behaviors enabled; first derivative members could potentially redefine the behavior of the class.

That all said, I think you should be very careful with defining methods as "protected virtual". I would use great caution in defining methods as "protected virtual", because doing so not only declares the possibility of overriding the functionality, but in some ways define an expectation of overridden functionality. That sounds to me like an underdefined set of behaviors to override; I would rather have a well-defined set of behaviors to override. If you want to have a very large set of overridable behaviors, I would rather look into Aspect Oriented Programming, which allows for that sort of thing in a very structured way.

This post is related to a current state of ModelBase class. Looks like it is time to refactor it (actions: Martijn, Gena)

Dynamic regular grid coverages in DeltaShell

Dynamic (time related) regular grid coverages and static regular grid coverages need a better presentation in the UI of DeltaShell:

  • The icon of a dynamic grid in the project explorer will differ from the icon for a static grid.
  • The visible properties for a dynamic regular grid in the propertygrid of DeltaShell will be extended with:
    time dependent yes/no (for all regular grids)
    time step
    start time
    end time
    equidistant yes/no
Aggregation Model for Habitat Plugin

In order to support aggregation for dynamic models a sixth model type will be added to the Habitat DeltaShell plugin: AggregationModel (TOOLS-1324).

An AggregationModel is a collection of rules of the form

Output = AggregationFunction(Input)

Where

Output is a static regular grid coverage
Input is a dynamic regular grid coverage

AggregationFunction is either:
Max, Min, Sum, Average, Mean, Percentile

To simplify the usage of Habitat the UI will show the Aggregation rules in one view.

The view for a AggregationModel will show a single gridview with the following columns:

  • input variable
  • AggregationFunction, the user can select the aggregation function via an ellipses (...) followed by a popup (for percentiles an extra parameter must be entered)
  • output variable, the system will generate a default name, the user can change this name.

An extra edit row will be added to the end of the grid to support adding new data. Cfr input variables in Table Reclassification (multiple). The input variable column will support a combobox that shows the input variables already in use in the model.

  • User can drag an output variable from another model as input to AggregationModel
Time dependent models review

Composite model execute logic

Given the scenario above when there are no dependencies between model the following will occur when the composite model is run.

CompositeModel.Initialize()
Model1.Initialize()
Model2.Initialize()
Model3.Initialize()
CompositeModel.Execute()
Model1.Execute()
Model2.Execute()
Model3.Execute()
Model2.Execute()
CompositeModel.Execute()
Model1.Execute()
Model2.Execute()
Model2.Execute()

Basically the rule is:

  While (there is a model with currenttime < compositemodel.currenttime + compositemodel.timestep)
     Run all models with currenttime < compositemodel.currenttime+compositemode.timestep in order

After which the compositemodel currenttime is increased and all child models and run against the new time.

If model3 is dependent of model2 (model3->input(t) == model2->output(t)) it can not run and should wait for model2 to deliver the data. This could be specified on the model by defining that it should not fail when input data is missing. Instead it should just report as ok but not increase its timestep. This could result if a model never finishing if this is not detected.

The start and endtime of composite model overrule start and endtime of childmodels
The start and endtime of composite model cannot be 'larger' than the smallest child model.
NB:The second rule rules out periodic extrapolation of model outputs.

Time notation for formula based model

Model 
  |-In
  |   |-A(t)
  |   |-B(t)
  |-Out
    |-C(t)

A formula could be C= A + B or C = A + A(t-1). If no t is specified the current time is assumed.

Test coverage has to be increased

But when writing tests:

  • Make sure quality of the tests is also high, otherwise these tests are not more than useless mess.
  • Keep them as clean and as small as possible. Especially for unit tests, there is no reason to have a unit test 1 page long.
  • Chose a proper category is necessary:
    • Unit Test - no category specified. In this case test must be:
      • small
      • very fast
      • should not use any external resources except class being tested, otherwise mock them, if it takes too much to mock - refactor both class and test!
    • [Category("Integration"] - when you need to test many classes working together, or even whole system.
    • [Category("DataAccess"] - the same as an Integration but involving any kind of reading/writing, like import, save project file, etc.
    • [Category("Performance"] - test which checks if some code does not run slower than expected. Keep code beteen stopwatch.Start() stopwatch.Stop() as small as possible, no timing of log.Debug() or any other crap.
    • [Category("Windows.Forms"] - put this category if you use WindowsFormsTestHelper.Show(Control control) to visualize any form. This is actually another type of Integration test. It mainly tests if controls, windows can show correctly. If possible - do some operation via API and add Asserts there as well.

Put [Ignore("message")] on those tests which are not ready yet (we have to clean-up them, there are too many already, they should have very limited lifetime.

Some rules from uclebob:

  • You are not allowed to write any production code unless it is to make a failing unit test pass.
  • You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
  • You are not allowed to write any more production code than is sufficient to pass the one failing unit test.

This is how our test coverage looks like today:

Wrong way to fix tests

This does not seem to be a good way to fix tests:

Also when writing Performance tests - make sure to time the right thing. For example measuring log.Write may give very varying time durations, measure as little block as possible, so instead of:

            ...
            DateTime startTime = DateTime.Now;
            regularGridCoverage.Resize(10000, 10000, 1, 1);
            log.DebugFormat("Writing 100000000 values took {0} ms", (DateTime.Now - startTime).TotalMilliseconds);
            Assert.Less((DateTime.Now - startTime).TotalMilliseconds, 25000);
            ...

Write:

            ...
            var startTime = DateTime.Now;
            regularGridCoverage.Resize(10000, 10000, 1, 1);
            var duration = (DateTime.Now - startTime).TotalMilliseconds;

            log.DebugFormat("Writing 100000000 values took {0} ms", duration);
            Assert.Less(duration, 25000);
            ...

Or:

            ...

            var stopwatch = new Stopwatch();
            stopwatch.Start();
            regularGridCoverage.Resize(10000, 10000, 1, 1);
            stopwatch.Stop();

            log.DebugFormat("Writing 100000000 values took {0} ms", stopwatch.Elapsed.TotalMilliseconds);
            Assert.Less(stopwatch.Elapsed.TotalMilliseconds, 25000);
            ...

Also make sure that message is written showing how log it took to do something, otherwise it is hard to check how long it takes on build server.

(plus) reviewed performance tests and decreased required time to run tests, also 30% of tests are wrong! Time is not measured or not a performance test at all. Keep quality of the tests higher.

... for example this test as a performance test is useless:

        [Category("Performance")]
        public void ReadRegularGridPerformance()
        {
            //todo: get a bigger grid file :)
            var store = new NetCdfFunctionStore();
            store.FunctionBuilders.Add(new RegularGridCoverageBuilder());
            store.Open("../../../../data/NCEP-NAM-CONUS_80km_best.ncd.nc");
            var grid = (IRegularGridCoverage) store.Functions.First(f => f is IRegularGridCoverage);
            log.DebugFormat("Reading {0} slices of {1} values", grid.Time.Values.Count, grid.SizeX*grid.SizeY);
            DateTime start = DateTime.Now;
            foreach (DateTime dt in grid.Time.Values)
            {
                Assert.AreEqual(6045, grid.GetValues(new VariableValueFilter<DateTime>(grid.Time, dt)).Count);
            }
            log.DebugFormat("Took {0} ms", (DateTime.Now - start).TotalMilliseconds);
        }