The data of the archive is always located in a single directory. This is always the data directory in the archive installation.
It is however still possible to use multiple storage for the archive. This can be a very useful feature. It is possible, for example, to store recent data in a fast (more expensive) storage and and older data in a slower (cheaper) storage.
The data folder of the archive should in the case not directly contain the data but point with symbolic links to the actual storages.
This is possible with all of the supported open archive versions.
Since 2024.02 additional support for this configuration is added to the harvester process:
- It is possible to configure a dedicated harvester for each storage, Harvesters for different storages can run simultaneously
- The catalogue can be cleared for a specific storage
- The incremental harvester can be configured to run for a specific storage, so that also can be used in a configuration with multiple storages
These new feature will explained with an example. In our example we have an archive which consists of two storages, the warm and cold storage.
The warm storage contains the recent data. After a configured period data is moved from the warm storage to the cold storage.
The data directory contains of two sub directories warm and cold. Both directories are symbolic links to the actual storages.
To configure a dedicated harvester for each storage the following configuration can be used.
<manualArchiveTask> <internalHarvesterTask id="harvester warm storage"> <storage>warm</storage> </internalHarvesterTask> <description/> </manualArchiveTask> <manualArchiveTask> <internalHarvesterTask id="harvester cold storage"> <storage>cold</storage> </internalHarvesterTask> <description/> </manualArchiveTask>
The configuration above is quite similar to the configuration of a regular harvester. The main difference is that a storage is configured. This storage is the sub directory of data directory which contains the actual storage.
In our case we have two sub directories, warm and cold. For each directory we configured a dedicated harvester.
In addition it is useful that the incremental harvester, the harvester which harvests only the most recent days quickly, knows which storages contains the most recent data.
A configuration example is given below.
<scheduledArchiveTask> <incrementalHarvesterTask id="incremental harvester"> <harvesterTimeSpan unit="day" multiplier="10"/> <storage>warm</storage> </incrementalHarvesterTask> <description>incremental harvester</description> <startTime>00:00:00</startTime> <endTime>23:59:00</endTime> <runIntervalInSeconds>300</runIntervalInSeconds> <active>false</active> </scheduledArchiveTask>
In addition to clearing the entire catalogue it is useful if only the data for a certain can be removed from the catalogue.
An configuration example is given below.
<manualArchiveTask> <removeObsoleteDataFromCatalogue id="remove data from cold storage"> <storage>cold</storage> </removeObsoleteDataFromCatalogue> <description>remove data from cold storage</description> </manualArchiveTask> <manualArchiveTask> <removeObsoleteDataFromCatalogue id="remove data from warm storage"> <storage>warm</storage> </removeObsoleteDataFromCatalogue> <description>remove data from warm storage</description> </manualArchiveTask>