...
It is however still possible to use multiple storage storages for the archive. This can be a very useful feature. It is possible, for example, to store recent data in a fast (more expensive) storage and and older data in a slower (cheaper) storage.
The data folder of the archive should in the this case not directly contain the data but point with symbolic links to the actual storages.
...
- It is possible to configure a dedicated harvester for each storage, Harvesters for different storages can run simultaneously
- The catalogue can be cleared for a specific storage
- The incremental harvester can be configured to run for a specific storage, so that also it can be used in a configuration with multiple storages
These new feature features will explained with an example. In our example we have an archive which consists of two storages, the warm and cold storage.
...
Code Block | ||
---|---|---|
| ||
<manualArchiveTask> <internalHarvesterTask id="harvester warm storage"> <storage>warm</storage> </internalHarvesterTask> <description/><description>harvester warm storage</description> </manualArchiveTask> <manualArchiveTask> <internalHarvesterTask id="harvester cold storage"> <storage>cold</storage> </internalHarvesterTask> <description/><description>harvester cold storage</description> </manualArchiveTask> |
The configuration above is quite similar to the configuration of a regular harvester. The main difference is that a storage is configured. This storage is the sub directory of the data directory which contains the actual storage.
In our case we have two sub directories, warm and cold. For each directory we configured a dedicated harvester.
Note that in the example above the harvester tasks are configured as manual tasks, because since the 202401 release, the full harvest task are advised to be manual tasks.
The described features are however also available in older client specific branch. If you use one of these branches. it is advised to schedule harvest tasks. If a incremental harvest tasks is also configured then a full harvest task
for the storage with the most recent data should preferably only once a day.
In addition it is useful that the incremental harvester, the harvester which harvests only the most recent days quickly, knows which storages contains the most recent data.
...