DataManagementTool.xml

The archive admin console provides tools to manage the data in the archive. To prevent that the amount of the data in the archive keeps growing several tools are available which can be used to remove expired data sets from the archive. This section will explain how to confgure and use these tools. First step is to define the rules for expiring data sets. The file DataManagementTool.xml can be used to define the rules for expiring data sets. 

Below is an example given of this file.

<arc:dataManagementTool xmlns:arc="http://www.wldelft.nl/fews/archive" xsi:schemaLocation="http://www.wldelft.nl/fews/archive http://fews.wldelft.nl/schemas//version1.0/archive-schemas/dataManagementTool.xsd" xmlns="http://www.wldelft.nl/fews/archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<outputFile>d:\fews\output\managementreport.csv</outputFile>
	<backupFolder>d:\fews\data_to_tape</backupFolder>	<!-- required when using the predefined remove-data-from-archive task-->
	<lifeTimeRules>
		<defaultLifeTime unit="year" multiplier="10"/>
		<defaultAction>default</defaultAction>	<!-- optional, only relevant when using custom tasks that require such action field-->
		<lifeTimeSimulatedDataSets>
			<lifeTime unit="year" multiplier="15"/>
			<action>remove</action>   <!-- optional, only relevant when using custom tasks that require such action field-->
		</lifeTimeSimulatedDataSets>
		<lifeTimeObservedDataSets>
			<lifeTime unit="year" multiplier="15"/>
			<action>remove</action>   <!-- optional, only relevant when using custom tasks that require such action field-->
		</lifeTimeObservedDataSets>
	</lifeTimeRules>
</arc:dataManagementTool>

This example contains all the mandatory elements of this file. The output file defines were the data management tool should write its output. The output will be the list of data sets which are expired according the defined rules. The action element is optional and only required if a customTask requires this field in the resulting managementreport.csv. The predefined remove-data-from-archive task neglects the action element.

LifeTimeRules

The default life time defines the life time of a data set in the archive when there is not a more specific rule defined. In the example above there is no rule defined for external data sets. This means that the default life time will be used for  external data sets. For simulated data sets there is a specific rule defined in the tag lifeTimeSimulatedDataSets. This means that for simulated data sets the default life time will not be used but the more specific rule in the lifeTimeSimulatedDataSets. If needed it is possible to define specific rules for the other types of data sets in the archive like external forecasts, rating curves, configuration etc. An example is given below.

<arc:dataManagementTool xmlns:arc="http://www.wldelft.nl/fews/archive" xsi:schemaLocation="http://www.wldelft.nl/fews/archive http://fews.wldelft.nl/schemas//version1.0/archive-schemas/dataManagementTool.xsd" xmlns="http://www.wldelft.nl/fews/archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<outputFile>d:\fews\output\managementreport.csv</outputFile>
	<lifeTimeRules>
		<defaultLifeTime unit="year" multiplier="10"/>
		<defaultAction>default</defaultAction>
		<lifeTimeSimulatedDataSets>
			<lifeTime unit="year" multiplier="15"/>
			<action>remove</action>
		</lifeTimeSimulatedDataSets>
		<lifeTimeObservedDataSets>
			<lifeTime unit="year" multiplier="15"/>
			<action>remove</action>
		</lifeTimeObservedDataSets>
		<lifeTimeExternalForecastDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeExternalForecastDataSets>
		<lifeTimeMessagesDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeMessagesDataSets>
		<lifeTimeConfigurationDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeConfigurationDataSets>
		<lifeTimeRatingCurveDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeRatingCurveDataSets>
	</lifeTimeRules>
</arc:dataManagementTool>

EventRules

In the FEWS OC or SA it is possible to define events. It is possible to configure a life time for data sets which belong to a certain event. Data sets which belong to an event are usually more important than other data sets in the same period. By configuring a different life time for data sets that belong to an event it is possible to keep these types of data sets longer in the archive.
Note: eventRules are applied before the standard lifeTimeRules. This means that if for a certain event type a lifetime is configured that is shorter than the default life times of the data labeled for that event type, that data will be kept for a shorter time than the rest of the data. 

An example is given below.

<arc:dataManagementTool xmlns:arc="http://www.wldelft.nl/fews/archive" xsi:schemaLocation="http://www.wldelft.nl/fews/archive http://fews.wldelft.nl/schemas//version1.0/archive-schemas/dataManagementTool.xsd" xmlns="http://www.wldelft.nl/fews/archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<outputFile>d:\fews\output\managementreport.csv</outputFile>
	<backupFolder>d:\fews\backup</backupFolder>
	<lifeTimeRules>
		<defaultLifeTime unit="year" multiplier="10"/>
		<defaultAction>default</defaultAction>
		<lifeTimeSimulatedDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeSimulatedDataSets>
		<lifeTimeSimulatedDataSets>
			<sourceId>sourceId</sourceId>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeSimulatedDataSets>
		<lifeTimeObservedDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeObservedDataSets>
		<lifeTimeExternalForecastDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeExternalForecastDataSets>
		<lifeTimeMessagesDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeMessagesDataSets>
		<lifeTimeConfigurationDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeConfigurationDataSets>
		<lifeTimeRatingCurveDataSets>
			<lifeTime unit="year" multiplier="10"/>
			<action>remove</action>
		</lifeTimeRatingCurveDataSets>
		<eventRule>
			<eventTypeId>historic event</eventTypeId>
			<lifeTime unit="year" multiplier="40"/>
		</eventRule>
		<eventRule>
			<eventTypeId>calibration event</eventTypeId>
			<lifeTime unit="year" multiplier="40"/>
		</eventRule>
		<eventRule>
			<eventTypeId>watercoach event</eventTypeId>
			<lifeTime unit="year" multiplier="40"/>
		</eventRule>
		<eventRule>
			<eventTypeId>review event</eventTypeId>
			<lifeTime unit="year" multiplier="40"/>
		</eventRule>
		<eventRule>
			<eventTypeId>flood watch event</eventTypeId>
			<lifeTime unit="year" multiplier="40"/>
		</eventRule>
	</lifeTimeRules>
</arc:dataManagementTool>

Deployment of DataManagementTool.xml

Once the configuration of the DataManagementTool.xml is finished it should be added to the configuration of the archive server. You can do this by uploading it to the archive. Select the tab manage configuration and upload the new DataManagementTool.xml to the archive by using the browse (to select the file) and upload button in the row DataManagementTool.xml. It is also possible to put this config file directly in the config folder of the archive but you will have to restart the tomcat instance of the archive server to make it aware of the fact that there is new file available.

ArchiveTaskSchedule.xml

When the DataManagementTool.xml is deployed, it becomes possible to run the data management tool to search for expired data sets in the archive. This task should be available as one of the tasks in the archive tasks tab. If it is not available there you should add it to your configuration. Go to the manage configuration tab and download the ArchiveTaskSchedule.xml file. Verify if the preDefinedArchiveTask data management tool is available. After adding this task you should upload the changed ArchiveTaskSchedule.xml to the archive server in the manage configuration tab.

An example is given below.

	<manualArchiveTask>
		<predefinedArchiveTask>data management tool</predefinedArchiveTask>
		<description>data management tool</description>
	</manualArchiveTask>

The task should now be available after selecting the tab archive tasks.

Start DataManagementTool

You can start this tool by pressing the start-button. After running this task the output should be available in the output file which is configured in the DataManagementTool.xml. You can download this file in the tab manage configuration. 

In the screenshot above the you can see that the file managementreport.csv can be downloaded by pressing the download button. You can download this file by pressing the download button to review which data sets are expired. If needed you can manually edit this file by using a text editor or Excel. After you reviewed and/or changed this file you can upload the changed file to the archive by using the upload button.

There are several characteristics of this tool that need to be taken into account:

  1. The tool does not make use of the catalog: it scans all the metadata.xml files in the archive.
  2. When examining the lifetime of archive data, it will assess it based on the attribute "creationTime" in the metadata.xml. If a metadata.xml describes multiple files, the tool will take the most recent creationTime to assess whether data is expired. As such it can happen that files with data of a specific source, which have been given a shorter lifetime, might not be removed because there are other files (for other sources) with a longer lifetime.
  3. The managementreport.csv groups files per "source" only and not individual files.

Remove data from archive

The task remove data from archive will move all the expired data sets that are listed in the DataManagementTool's output file to the configured backup folder in the DataManagementTool.xml file. First verify if this task is configured in the ArchiveTaskSchedule.xml file. A preDefinedArchiveTask remove data from archive should be available. If this task is not available yet, you should download the ArchiveTaskSchedule.xml file (you can do this in the Manage Configuration tab) and add this task to that file, for example as manualArchiveTask (see example below). After uploading this file the task should be available in the Archive Tasks tab. Use the start button to run this task. The task remove data from archive will move all the files listed in the output file of the data management tool to the configured backup folder in the DataManagementTool.xml.

Since FEWS version 2022.01 it is also possible to run this task on a schedule. To configure this, the predefinedArchiveTask in the archiveTasksSchedule.xml example below should be inserted as a scheduledArchiveTask instead of a manualArchiveTask.

<manualArchiveTask>
	<predefinedArchiveTask>remove data from archive</predefinedArchiveTask>
	<description>Remove data from the archive</description>
</manualArchiveTask>	

Note that due to the grouping of files per source in the managementreport.csv, the tool in certain circumstances can indicate that files, labeled for removal, could not be found. This is because they were already removed.

  • No labels