Page History

...

The tab health checker shows an overview of all the build-in checks which are available for the Archive.

Image Modified

Running a health check is always a two step process.

...

The page Archive Tasks can be used to start and stop archive tasks. It also possible to see which tasks are scheduled and what their task schedule is.

The configuration of the archive tasks can be done by adjusting the ArchiveTaskSchedule.xml. Detailed information about the configuration of this file can be found here:Configuration of the Delft-FEWS Archive Server

Below an screenshot of the display "schedule tasks"

Image Removed

Task History

The page task history shows when certain archive tasks were started and when they were finished. The page also indicates if the tasks was finished succesfully or not.

Image Added

The menu item history task runs shows when tasks are started and stopped by the scheduler. For the standard tasks like file sweeper, harvester, clear archive and the historical events exporter there is a tab available for eacht each task. Custom defined tasks are shown in the tab custom.

Archive tasks

The sections above explained how to start archive tasks from the user interface. This section will explain what the archive tasks do.

Clear catalogue

The clear catalogue task will remove all the metadata from the catalogue. In addition it removes all the files named xxxx.recordid from the disk.The part xxxx in the file xxxx.recordid file name is the record id of the dataset in the catalogue. If the clear archive tasks has run successfully the catalogue (geonetwork) should be empty. In addition all xxxx.recordid files should be removed from the disk. Why is this useful? After running this task you can start a harvester task. This harvester task will rebuild the entire catalogue again. Beside each dataset in the archive there is a metadata.xml file. This file contains the metadata for the dataset it belongs to. The harvester will rebuild the catalogue by reading these files. In some cases the catalogue and the datasets on disk get out of synch for any reason (removing data from disk in an incorrect way, a crash of geonetwork). To get the catalogue and the datasets on disk in synch again you can run the clear archive task followed up by running the harvester. Also note that the next time the harvester runs the harvester will detect which datasets on disk it has already processed by the fact that there is a xxxx.recordid in the directory of the dataset.

Harvester

The harvester makes sure that the datasets on disk and the catalogue keep in synch. If for example a new dataset is added to the archive the harvester will detect this and create a metadata file in geonetwork for this dataset. The record id of the dataset in the catalogue is stored in the dataset on disk by creating an empty file with the format xxxxx.recordid. The xxxxx-part of the filename will define the record id in the catalogue. If a dataset has been removed by the administrator, the recordId file needs to be left in place. The harvester will then identify that a dataset has been removed and it will take out the record from the catalogue. The harvester will also remove the recordId file from disk and clean up the empty directory.

If a catalogue has been cleared and the harvester builds up a new catalogue, geonetwork may get a bit overloaded by the amount of inserts to be done. To prevent this overload, the harvester only prepares a queue of max.300 records for insertion. In addition, it monitors insertion progress. If no progress has been made in the last minute, the harvester goes into sleep mode for 30 minutes. After getting awake, the insertion process continues up till the entire catalogue is rebuild. At this moment, the admin console shows harvesting progress, but it does not (yet) indicate when the harvester is in sleep mode.

File sweeper

In some cases data files are exported twice to the archive. The system feeding the archive might have new data which overrides the previous data. In this data cases existing data files are overriden by new files. If the files which should be overriden are temporarily locked by archive for reading, the system feeding archive should store the new file besides the existing file with the same name but with the extension new. The filesweeper will detect these files and will replace the original data file with the new one as soon as the filesweeper runs and the lock the data file is released.

Historical events exporter

The historical events exporter exports Delft-FEWS historical events to a pre-defined directory. The historical event can be defined by an xml file in the events directory. The xml file and the data which is part of the historical event is exported to the pre-defined directory where a Delft-FEWS workflow will pickup the data.

Image Removed

Data management tool

...

Page tree

Versions Compared

Old Version 26

New Version 27

Key