Open Archive with multiple storages

The data of the archive is always located in a single directory. This is always the data directory in the archive installation.

It is however still possible to use multiple storages for the archive. This can be a very useful feature. It is possible, for example, to store recent data in a fast (more expensive) storage and and older data in a slower (cheaper) storage.

The data folder of the archive should in this case not directly contain the data but point with symbolic links to the actual storages.

This is possible with all of the supported open archive versions.

Since 2024.02 additional support for this configuration is added to the harvester process:

It is possible to configure a dedicated harvester for each storage, Harvesters for different storages can run simultaneously
The catalogue can be cleared for a specific storage
The incremental harvester can be configured to run for a specific storage, so that also it can be used in a configuration with multiple storages

These new features will explained with an example. In our example we have an archive which consists of two storages, the warm and cold storage.

The warm storage contains the recent data. After a configured period data is moved from the warm storage to the cold storage.

The data directory contains of two sub directories warm and cold. Both directories are symbolic links to the actual storages.

To configure a dedicated harvester for each storage the following configuration can be used.

	<manualArchiveTask>
		<internalHarvesterTask id="harvester warm storage">
			<storage>warm</storage>
		</internalHarvesterTask>
		<description>harvester warm storage</description>
	</manualArchiveTask>
	<manualArchiveTask>
		<internalHarvesterTask id="harvester cold storage">
			<storage>cold</storage>
		</internalHarvesterTask>
		<description>harvester cold storage</description>
	</manualArchiveTask>

The configuration above is quite similar to the configuration of a regular harvester. The main difference is that a storage is configured. This storage is the sub directory of the data directory.

In our case we have two sub directories, warm and cold. For each directory we configured a dedicated harvester.

Note that in the example above the harvester tasks are configured as manual tasks, because since the 202401 release, the full harvest task are advised to be manual tasks.

The described features are however also available in older client specific branch. If you use one of these branches. it is advised to schedule harvest tasks. If a incremental harvest tasks is also configured then a full harvest task

for the storage with the most recent data should preferably only once a day.

In addition it is useful that the incremental harvester, the harvester which harvests only the most recent days quickly, knows which storages contains the most recent data.

A configuration example is given below.

    <scheduledArchiveTask>
		<incrementalHarvesterTask id="incremental harvester">
			<harvesterTimeSpan unit="day" multiplier="10"/>
			<storage>warm</storage>
		</incrementalHarvesterTask>
		<description>incremental harvester</description>
		<startTime>00:00:00</startTime>
		<endTime>23:59:00</endTime>
		<runIntervalInSeconds>300</runIntervalInSeconds>
		<active>false</active>
	</scheduledArchiveTask>

In addition to clearing the entire catalogue it is useful if only the data for a certain can be removed from the catalogue.

An configuration example is given below.

	<manualArchiveTask>
		<removeObsoleteDataFromCatalogue id="remove data from cold storage">
			<storage>cold</storage>
		</removeObsoleteDataFromCatalogue>
		<description>remove data from cold storage</description>
	</manualArchiveTask>
		<manualArchiveTask>
		<removeObsoleteDataFromCatalogue id="remove data from warm storage">
			<storage>warm</storage>
		</removeObsoleteDataFromCatalogue>
		<description>remove data from warm storage</description>
	</manualArchiveTask>

Page tree

Open Archive with multiple storages