In releases prior to Delft-FEWS 2024.01 it was recommended to have a full harvest task scheduled to run at least once a day. This task scanned the entire content of the archive data folder and added data sets to the catalogue which were not harvested yet.

In addition the full harvest task would check if there were data sets in the catalogue which were no longer present in the archive. After a full harvest run the catalogue would always be up-to-date.

The main downside of this approach was that this task would take a long time to complete.


From the release Delft-FEWS 2024.01 the harvest process is improved in the following way;

1) All the data exports from FEWS to the archive now inform the archive about which data sets were newly added to the archive. The archive will immediately harvest these newly added data sets so that they become directly available in the catalogue.

2) Archive tasks which add data to the archive or remove data from the archive such as the archive amalgamate or the data removal task also make sure that the catalogue is immediately updated directly after they finished their task.


It therefore no longer necessary to run a full harvest task at a daily basis. You only need to run the harvest task after you have upgraded to a new Delft-FEWS version (run clear catalogue first!)

Also when you have deleted data from the archive manually or by using your own scripts it is recommended to run the full harvest task.


For the harvest tasks the following tasks are recommended for your configuration:

1) Full harvest task. Make sure that task is not scheduled. The task should only be run manually

2) Delete obsolete data from catalogue task. This task check if there is data in the catalogue which is already expired

2) Immediate harvest task. This task is run when new data is exported to the archive or when data is removed from the archive.

3) Incremental harvest task. This harvest task should be scheduled every 10 minutes. By default, this task harvests the last 7 days in the archive, but it is possible to configure a different period, e.g. <harvesterTimeSpan unit="day" multiplier="10"/>. Normally new data should be harvested automatically by the Immediate harvester. However when the immediate harvest was not available for any reason then the Incremental harvester will make sure that the data sets will be harvested.


Below an configuration example

<archiveTasksSchedule xsi:schemaLocation="http://www.wldelft.nl/fews/archive http://fews.wldelft.nl/schemas//version1.0/archive-schemas/archiveTasksSchedule.xsd" xmlns="http://www.wldelft.nl/fews/archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<scheduledArchiveTask>
		<predefinedArchiveTask>incremental harvester internal catalogue</predefinedArchiveTask>
		<description>Incremental harvester (harvest the last 7 days)</description>
		<startTime>00:00:00</startTime>
		<endTime>23:59:00</endTime>
		<runInterval>1</runInterval>
		<active>true</active>
	</scheduledArchiveTask>
	<scheduledArchiveTask>
		<predefinedArchiveTask>immediate harvester task</predefinedArchiveTask>
		<description>Immediate harvester (harvest the newly added or deleted data sets immediately)</description>
		<startTime>00:00:00</startTime>
		<endTime>23:59:00</endTime>
		<runIntervalInSeconds>60</runIntervalInSeconds>
		<active>true</active>
	</scheduledArchiveTask>
	<scheduledArchiveTask>
		<predefinedArchiveTask>file sweeper</predefinedArchiveTask>
		<description>File sweeper</description>
		<startTime>00:00:00</startTime>
		<endTime>23:59:00</endTime>
		<runInterval>1</runInterval>
		<active>true</active>
	</scheduledArchiveTask>
	<manualArchiveTask>
		<predefinedArchiveTask>harvester internal catalogue</predefinedArchiveTask>
		<description>Full harvest task (this task may take a long time to complete!)</description>
	</manualArchiveTask>
	<manualArchiveTask>
		<predefinedArchiveTask>clear internal catalogue</predefinedArchiveTask>
		<description>Clear internal catalogue</description>
	</manualArchiveTask>
	<manualArchiveTask>
		<predefinedArchiveTask>remove obsolete data from catalogue</predefinedArchiveTask>
		<description>Remove obsolete data from catalogue</description>
	</manualArchiveTask>
</archiveTasksSchedule>




  


  • No labels