Archive Admin Console User Guide

Status archive

The user admin provides an overview of the current status of the archive with the display "status archive". This display is the default display which the use sees when it is accessed by

a web browser. The top section "Status archive" shows how many records there are available in geonetwork. If is not possible to login to Geonetwork than the message not able to login to Geonetwork is shown.

In addition the total amount of space and free space on disk is shown.

The lower section shows a health check of the archive. If a problem is detected with the archive this will be shown in the section below. The health check which has failed will be shown if red.

Archive tasks

From the 2014.02 it is possible to schedule the archive tasks from the archive web application. The menu item archive tasks has to sub items (schedule tasks and history task runs).

The item schedule tasks shows the archive tasks and how they scheduled. The schedule of the archive tasks can be configured by the config file ArchiveTasksSchedule.xml. This file is placed in the config directory of the archive. the menu option manage configuration can be used to change this file on the server.

However note that not all tasks shown in the display can be scheduled. The task clear archive can only be started manually. This task removes all metadata records from the archive.

It is also possible to run archive tasks manually from this display. This can be done by using the start button. Running tasks can be stopped by using the stop-button. This button is only available when the

task is running. The button logfile can be used to download the logfile of the selected task.

In addition to scheduling standard tasks it is also possible to define custom tasks in the ArchiveTasksSchedule.xml. If you want a custom task which is not scheduled but can only run manually than you should deactivate the schedule.

An example is shown below

<arc:scheduledArchiveTask>
<customArchiveTask>
<archiveTaskId>example</archiveTaskId>
<executableFile>c:/bin/example.bat</executableFile>
<logFile>c:/bin/log/example.log</logFile>
</customArchiveTask>
<arc:description>schedule for the historical events exporter</arc:description>
<arc:startTime>05:00:00</arc:startTime>
<arc:endTime>08:00:00</arc:endTime>
<arc:runInterval>1</arc:runInterval>
<arc:active>true</arc:active>
</arc:scheduledArchiveTask>

Below an screenshot of the display "schedule tasks"

The menu item history task runs shows when tasks are started and stopped. For the standard tasks like file sweeper, harvester, clear archive and the historical events exporter there is a tab available for eacht task. Custom defined tasks are shown in the tab custom.

Archive tasks

The sections above explained how to start archive tasks from the user interface. This section will explain what the archive tasks do.

Clear catalogue

The clear catalogue task will remove all the metadata from the catalogue. In addition it removes all the files named xxxx.recordid from the disk.The part xxxx in the file xxxx.recordid file name is the record id of the dataset in the catalogue. If the clear archive tasks has run successfully the catalogue (geonetwork) should be empty. In addition all xxxx.recordid files should be removed from the disk. Why is this useful? After running this task you can start a harvester task. This harvester task will rebuild the entire catalogue again. Beside each dataset in the archive there is a metadata.xml file. This file contains the metadata for the dataset it belongs to. The harvester will rebuild the catalogue by reading these files. In some cases the catalogue and the datasets on disk get out of synch for any reason (removing data from disk in an incorrect way, a crash of geonetwork). To get the catalogue and the datasets on disk in synch again you can run the clear archive task followed up by running the harvester. Also note that the next time the harvester runs the harvester will detect which datasets on disk it has already processed by the fact that there is a xxxx.recordid in the directory of the dataset.

Harvester

The harvester makes sure that the datasets on disk and the catalogue keep in synch. If for example a new dataset is added to the archive the harvester will detect this and create a metadata file in geonetwork for this dataset. The record id of the dataset in the catalogue is stored in the dataset on disk by creating an empty file with the format xxxxx.recordid. The xxxxx-part of the filename will define the record id in the catalogue.

File sweeper

Manage configuration

The display "manage configuration" allows the configurator to download configuration files, change them manually on their own pc and upload the changed configuration files to the archive server.

When an invalid file is uploaded to the archive server this will be detected by the system and the file will not be used.

Page tree

Archive Admin Console User Guide

Status archive

Archive tasks

Archive tasks

Clear catalogue

Harvester

File sweeper

Manage configuration