...
Anchor | ||||
---|---|---|---|---|
|
Within Delft-FEWS, the datasets are exported to the archive in a workflow using the ExportArchiveModule. This workflow needs to be scheduled on a regular interval to archive all relevant data. Simulated data sets are preferably exported to the archive from the workflow which that created the simulation. This is especially the case when modifiers are used in the simulation. When a simulation is not exported to the archive directly, it might be that some modifiers which are that were used in the simulation are deleted are or changed in between. This will cause that it won't be possible to archived all the used modifiers correctlythe mean time. This can then result in archived simulation data without the modifiers that were used to produce them.
To prevent dependencies on other processes, the ExportArchiveModule is envisioned to write directly into the archive file storage. The FSS thus needs to be able to have write access to those disks.
For each kind of dataset, the ExportArchiveModule checks the database for changes over a (configured) relative period. It exports any all data which that meets the export instructions and that has changed within this period. Datasets are archived in a pre-defined predefined directory structure, which is based on areaId, date and dataset name.
The schema of the associated configuration file (Figure 4.1) is defined at:
http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd.
Figure 4.1 Top level of Delft-FEWS exportArchiveModule.xsd
...
Table of Contents | ||
---|---|---|
|
Archiving to File Storage
Archiving by using time series sets or workflow selection
The FEWS archive export exporting module has several exports in which data can be archivedoptions for exporting data. In all those exports, time series can be selected by either specifying the time series by using time series sets or it is possible to define by specifying a workflow in combination with a time series filter. In this case When specifying a workflow, all the time series which that comply to with the defined filter will be exported. By default the This filter will can be applied used to the time series select the desired data type, parameter, location, qualifier, etc. for exporting. By default the filter will be applied to the time series of the current workflow. If a workflow id workflowId is defined then the data from the last or current run will be used. Or if One can also define a period is defined then , whereby all simulations or external forecasts for the defined that period will are be exported. When a time series filter is used then it is also possible to define a fileNamePrefix. The generated file name will then be prefixed with the defined prefix. This can be used to make the predefined filename more descriptive.
Below an example of an export with a time series selection
Generated file name
The paths of the folders to which the data will be written is the same for both options, following the default folder structure for the respective data type.
When exports are setup by specifying timeseries, one must configure the filename of the exported file. If the workflow selection option is used to select which time series will be archived, it is not necessary to define the file names of the netcdf files in which the data will be archived. These names will be generated based on the timeseries that are stored in the file. The generated filename will be determined as follows:
<parameterid>_<qualifierIds>_<timestep>_<valuetype>.nc
A file containing scalar data for parameter Hobs with qualifier Q and time step SETS60 will be stored in file Hobs_Q_SETS60_scalar.nc. It is also possible to configure a fixed prefix for each generated file name.
If timeseries only differ by module instance id then the filename is extended with the module instance id.
Example
Below an example of an export for a workflow with time series selection through a filter. In the workflowTimeSeriesSelection section, one must specify at least one filter and one workflowId (optional).
Code Block | ||
---|---|---|
| ||
<exportArchiveModule xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd" xmlns:xsi | ||
Code Block | ||
| ||
<exportArchiveModule xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wldelftw3.nl/org/2001/XMLSchema-instance" xmlns="http://www.wldelft.nl/fews"> <exportExternalForecast> <general> <archiveFolder>$ARCHIVE_DIR$</archiveFolder> <idMapId>IdExportArchiveExternalForecasts</idMapId> </general> <activities> <netcdfExportActivities> <netcdfExportActivity> <fileNamePrefix>Deltares</fileNamePrefix> <areaId>test</areaId> <sourceId>test</sourceId> <includeComments>false</includeComments> <includeFlags>false</includeFlags> <useModuleInstanceIdAsSourceId>true</useModuleInstanceIdAsSourceId> <workflowTimeSeriesSelection> <timeSeriesFilter> <parameterId>Q</parameterId> </timeSeriesFilter> <workflowId>some_sub_workflow</workflowId> </workflowTimeSeriesSelection> </netcdfExportActivity> </netcdfExportActivities> </activities> </exportExternalForecast> </exportArchiveModule> |
...
Element | Format | Description | |
general ComplexType | |||
dataSetPeriod | string | By default the daily data sets are created. It also possible to create monthly data sets. For operational purposes it is recommend to use daily data sets. For migrating data from other systems to the FEWS archive it might be convenient to export the data in montly data sets. | |
archiveFolder | string | Export destination folder, assumes that the account running the FEWS (FSS) application has write access | |
relativePeriod | Exports entire the dataset by day. | ||
idMap | string | idMap applied to translate internal FEWS identifiers to identifiers that meet NetCDF-CF criteria.E.g. netcdf does not allow a full stop ('.') in the variable name | |
unitConversionsId | string | Id of UnitConversions to be used for unit mapping. Available since 2022.02, optional. | |
omitMissingValues | bool | Available since 2022.02. If set to true, missing values won't be written to the export file. Optional, default is false. | |
ignoreNonExistingLocationSets | bool | If this option is set FEWS will not log an error when an location set is not configured. | |
---|---|---|---|
verifyExportedTimeSeries | bool | if Deprecated since 2024.01 because this check takes much time and is not need any more. For previous versions, if this option is set to true, FEWS will verify if the data in the exported netcdf is the same as in the FEWS database. | |
netcdfObservedExportActivities ComplexType | |||
fileName | string | should include nc extension, otherwise files will not be read. preferably no spaces | |
areaId | string | area to which the dataset belongs | |
sourceId | string | defines the source of the data set | |
exportLocationAttributeAsNetCDFVariable | complex type | Since 2021.01, for archiving observed data. Adds a variable to the NetCdf file. Name of the variable is the value of ncVariable, the value is the configured location attribute | of attributeIdof attributeId. |
ncMetaData | string elem. | optional metadata tags within NetCDF file following CF convention. Supported by the internal catalogue of the THREDDS Data Server | |
includeFlags | bool | default=TRUE; if TRUE, a list of flags is stored, each value pointing to the associated flag | |
includeTimeSeriesProperties | bool | default=TRUE, if TRUE the properties of the time series will also be stored in the NetCDF file. | |
includeComments | bool | default=TRUE; if TRUE, a list of comments is stored, each value pointing to the associated comment | |
thresholdGroupId | string | identifies FEWS ThresholdGroup which is used to detect threshold crossings to be highlighted in the metaData.xml | |
timeSeriesSets | FEWS timeseries sets | ||
workflowTimeSeriesSelection | In this tag the time series can selected by using a time series filter from either the current workflow or 1 or more other workflows. |
*When an existing file is locked while it needs to be overwritten, the export function writes a new temporary file. The FileSweeper, a scheduled process, renames this file when the lock is removed from the original file.
Figure 4.2 Deft-FEWS export configuration for archiving observations
Note: Not all information from observed (external historical) time series is archived. The data flags can be archived by setting the element "includeFlags" to true. The flag sources and users that made edits to the data is currently not archived.
Amalgamate observations
With daily exports of observed data per area, the number of small datasets on the file system may increases quickly. This has several disadvantages as the harvesting process will take longer processing many small datasets, while the performance of seamless integration (and webservices) will also drop. Therefore, an amalgamate process is developed which merges the daily observation files into one observation file per month. Currently, this amalgamate process is executed from the Fews-application side by an FSS by using the following configuration. Specify the relativePeriod in such way that the observed datafiles are stable and are not updated anymore. E.g. use one or two months of delay. There is no use in specifying a startOverrulable: when manually starrting this workflow it will only preocess the configured relativePeriod.
Code Block | ||||
---|---|---|---|---|
Code Block | ||||
| ||||
<exportArchiveModule xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd"> <amalgamateObserved> <general> <archiveFolder>$ARCHIVE_FOLDER$</archiveFolder> <relativePeriod unit="day" start="-62" end="-31"></relativePeriod> </general> <activities> <amalgamateObservedData> <areaId>$BASIN$</areaId> </amalgamateObservedData> </activities> </amalgamateObserved> </exportArchiveModule> |
...
It is also possible to archive a simulation from a scheduled workflow. In this case a relative period (relativePeriod in schema) should be configured to indicate which time period should be archived. In addition the workflow id (workflowId in schema) should be configured which should be archived.
Exporting the current forecast in a workflow which didn't computed the forecast
It is also possible to archive the current forecast to the archive from another workflow that computed the forecast.
If the export is done without an relative period but not in the workflow which created the forecast the current forecast will be exported. If you have used modifiers in this workflow you should also configure the relatedWorkflowId to indicate which workflow was run to compute the forecast, this will ensure that the modifiers used in this workflow will also be exported to the archive.
...
archived. In addition the workflow id (workflowId in schema) should be configured which should be archived.
Exporting the current forecast in a workflow which didn't computed the forecast
It is also possible to archive the current forecast to the archive from another workflow that computed the forecast.
If the export is done without an relative period but not in the workflow which created the forecast the current forecast will be exported. If you have used modifiers in this workflow you should also configure the relatedWorkflowId to indicate which workflow was run to compute the forecast, this will ensure that the modifiers used in this workflow will also be exported to the archive.
The associated root directory structure of the Delft-FEWS export for this type of dataset is as follows:
<archiveRoot>/<yyyy>/<MM>/<areaId>/<dd>/simulated/<workflowId><TimeZero><DispatchTime>
Where <dd> refers to the date of the forecast time T0.
This directory holds the metaData.xml as well as runInfo.xml file with the FEWS taskrun properties (see Figure 3.8). Within this directory the following sub-folders may exist:
/timeseries, /reports, /modifiers, /states
The exportArchiveModule.xsd has a dedicated exportSimulated section to configure the messages that need to be archived (see Figure 3.7). The associated specification is given in Table 4.4.
Table 4.4 Delft-FEWS export configuration for archiving simulations
Store time series in the netcdf storage
It is possible to store a simulation partly in the Open Archive (meta data, states, reports, modifiers and reports) and partly (time series) in the netcdf storage. This can be useful when you want to have a 'clean' time series only storage. This can be configured by add the element netcdfStorageExport to the general section. An example is given below.
Code Block | ||
---|---|---|
| ||
<?xml version="1.0" encoding="UTF-8"?>
<exportArchiveModule xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd">
<exportSimulated>
<general>
<archiveFolder>folder</archiveFolder>
<areaId>area</areaId>
<netcdfStorageExport>
<matchingAttributeId>attribute1</matchingAttributeId>
<matchingAttributeId>attribute2</matchingAttributeId>
</netcdfStorageExport>
<unitConversionsId>UnitConversionArchive</unitConversionsId>
<omitMissingValues>true</omitMissingValues>
<coldStartDataExportRelativeStartTime value="-1" unit="day"></coldStartDataExportRelativeStartTime> </general>
<activities>
<netcdfExportActivities>
<netcdfExportActivity>
<fileName>test.nc</fileName>
<timeSeriesSet>
<moduleInstanceId>test</moduleInstanceId>
<valueType>scalar</valueType>
<parameterId>par</parameterId>
<locationId>loc</locationId>
<timeSeriesType>simulated forecasting</timeSeriesType>
<timeStep unit="hour"></timeStep>
<readWriteMode>add originals</readWriteMode>
</timeSeriesSet>
</netcdfExportActivity>
</netcdfExportActivities>
</activities>
</exportSimulated>
</exportArchiveModule> |
Because the time series and the meta data for the simulation are now stored separately it is important to know to which simulation a set of netcdf files belong. The meta data and the netcdf files are matched by using the task run id. In addition they are matched by the configured matchingAttributeId. The meta data of a simulation has the value of these attributes stored. They need to match with the values in the actual netcdf files. In netcdf storage you need to configure were the time series are stored so that they can be harvested by the harvester of the external netcdf storage. Below a configuration example is shown.
Code Block | ||
---|---|---|
| ||
<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2014 rel. 2 sp1 (http://www.altova.com) by Afdeling ICT (Stichting Deltares) -->
<externalStorage xsi:schemaLocation="http://www.wldelft.nl/fews/archive http://fews.wldelft.nl/schemas//version1.0/archive-schemas/externalStorage.xsd" xmlns="http://www.wldelft.nl/fews/archive" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<simulatedForecastingNetcdfStorage id="rwsos.noordzee.matroos.maps">
<dataFolder>c:\FEWS\archive\matroos\maps</dataFolder>
<matchingAttributeId>source</matchingAttributeId>
<matchingAttributeId>issue_time</matchingAttributeId>
</simulatedForecastingNetcdfStorage>
</externalStorage> |
Element | Format | Description |
GeneralExportForecastSection ComplexType | ||
archiveFolder | string | Export destination folder, assumes that the account running the FEWS (FSS) application has write access |
relativePeriod | Exports entire simulated timeseries by workflow. | |
workflowId | string | This option should be in combination with the relativePeriod. Al the forecasts of the workflowId which exists in the relative period will be exported |
idMap | string | idMap applied to translate internal FEWS identifiers to identifiers that meet NetCDF-CF criteria.E.g. netcdf does not allow a full stop ('.') in the variable name |
unitConversionsId | string | Available since 2022.02. Optional. Id of UnitConversions to be used for unit mapping. |
omitMissingValues | boolean | Available since 2022.02. Optional. If set to true, time steps with missing values won't be written to the export file. |
coldStartDataExportRelativeStartTime | Available since 2022.02. Optional. If this is specified, then the period for which data is exported will be truncated for simulated data that was created in a run that started with a cold state. In other words for this type of data, any data before the specified start time will not be exported. This start time is relative to the time zero of the run in which the data was created. This can be used to avoid exporting data that was created during the warm-up period of a model run after a cold start. Note: this option only works when the default state selection was configured to be cold state for the run in which the data was created. | |
relatedWorkflowId | string | |
Element | Format | Description | GeneralExportForecastSection ComplexType |
archiveFolder | string | Export destination folder, assumes that the account running the FEWS (FSS) application has write access | relativePeriod | Exports entire simulated timeseries by workflow. |
workflowId | string | This option should be in combination with the relativePeriod. Al the forecasts of the workflowId which exists in the relative period will be exported |
idMap | string | idMap applied to translate internal FEWS identifiers to identifiers that meet NetCDF-CF criteria.E.g. netcdf does not allow a full stop ('.') in the variable name |
relatedWorkflowId | String | If the export is done without the relativePeriod and workflowId option but not in the workflow which created the forecasts the workflow id of the forecast you are trying to export should be configured to make sure that the modifiers which are used in this forecast will be exported. |
ForecastActivities ComplexType | ||
NetcdfForecastExportActivities ComplexType | ||
NetcdfForecastExportActivity ComplexType* | ||
fileName | string | without nc extension, preferably no spaces |
areaId | string | area to which the dataset belongs |
ncMetaData | string elem. | optional metadata tags within NetCDF file following CF convention. Supported by the internal catalogue of the THREDDS Data Server |
includeFlags | bool | only applied for scalar values. |
includeComments | bool | only applied for scalar values. |
timeSeriesSets | FEWS timeseries sets | |
ReportsExportActivity ComplexType* | ||
subfolder | string | Subdirectory within the 'reports' directory |
moduleInstanceId | string | moduleInstanceId which created the report |
ModuleStatesExportActivity ComplexType* | ||
moduleInstanceId | string | moduleInstanceId which created the state |
*When an existing file is locked while it needs to be overwritten, the export function writes a new temporary file. The FileSweeper, a scheduled process, renames this file when the lock is removed from the original file.
Figure 4.5 Delft-FEWS export configuration for archiving simulations
...
Delft-FEWS can export the current configuration to the archive via the ArchiveExportModule (exportConfig activity). The configuration thus is exported as part of the workflow. Before exporting, FEWS checks if the config of the current revisionId was already written on the file system. If the files are moved from the exported directory, it will thus export again later, using the same file and folder names, so that in a regular data transfer setup the files will get be overwritten in each export again.
. The configuration thus is exported as part of the workflow. The associated root directory structure of the Delft-FEWS export for this type of dataset is as follows:
<archiveRoot>/data/<config>config/<areaId>/<yyyymmdd>/<mastercontroller_id>_<revisionId>
The date refers to the revision date of the configuration. The file name typically holds the revisionId.configuration. The revisionId is used in the subfolder name (see path above) and the file name of the z.ip file, i.e.: <mastercontroller_id>_<revisionId>.zip. The .zip file will be accompanied by a suiting metaData.xml file.
The exportArchiveModule.xsd has a dedicated exportConfig section to setup the export of the Configuration (see Figure 4.6). This is due to be revised as the same relativePeriod based export mechanism should be adopted for checking what to export (see Table 4.5).
Figure 4.6 Delft-FEWS export configuration for archiving configurations
...
Element | Format | Description | ||
GeneralExportConfig ComplexType | ||||
archiveFolder | string | Export destination folder, assumes that the account running the FEWS (FSS) application has write access | ||
relativePeriod (TO DO) | Exports entire configuration when database change (config revision) has been detected within the relativePeriod (relative to T0). Existing files are overwritten* | ExportConfigActivities ComplexType | ||
ExportConfigActivity ComplexType* | ||||
areaId | string | area areaId to which the dataset belongs |
...
Code Block | ||
---|---|---|
| ||
<exportArchiveModule xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wldelft.nl/fews"> <exportSnapShot> <general> <archiveFolder>$ARCHIVE_DIR$</archiveFolder> </general> <activities> <exportSnapShot> <areaId>area</areaId> <filter id="only time series"> <xmlConfig enabled="false" name="Default xml config" synchLevel="11"/> <coldStates enabled="false" name="Default cold states" synchLevel="11"/> <moduleDataSets enabled="false" name="Default module data sets" synchLevel="11"/> <mapLayers enabled="false" name="Default map layers" synchLevel="11"/> <icons enabled="false" name="Default icons" synchLevel="11"/> <reportTemplates enabled="false" name="Default report templates" synchLevel="11"/> <reportImages enabled="false" name="Default report images" synchLevel="11"/> <continuousTimeSeries enabled="true" name="Simulated" synchLevel="0" maxAge="1000" unit="week"/> <continuousTimeSeries enabled="true" name="Telemetry" synchLevel="1" maxAge="1000" unit="week"/> <continuousTimeSeries enabled="true" name="Manual" synchLevel="5" maxAge="1000" unit="week"/> <continuousTimeSeries enabled="true" name="Astronomical and climatological" synchLevel="4" maxAge="1000" unit="week"/> <continuousTimeSeries enabled="true" name="Small external forecast grids" synchLevel="6" maxAge="1000" unit="week"/> <continuousTimeSeries enabled="true" name="Large external forecast grids" synchLevel="16" maxAge="10000" unit="week"/> <warmStates enabled="false" name="Warm states" maxAge="10" unit="week"/> <logEntries enabled="false" name="Log Entries" maxAge="1" unit="week"/> <thresholdEvents enabled="false" name="Threshold Events" maxAge="1" unit="week"/> </filter> </exportSnapShot> </activities> </exportSnapShot> </exportArchiveModule> |
The base builds and non-default rootConfig files are always excluded from the snapshot
Archiving Delft-FEWS products
...
Code Block | ||
---|---|---|
| ||
<exportArchiveModule xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/exportArchiveModule.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wldelft.nl/fews"> <exportProducts> <general> <archiveFolder>$ARCHIVE_DIR$</archiveFolder> </general> <activities> <exportProduct> <areaId>areaTest</areaId> <sourceId>sourceTest</sourceId> <importFolder>$IMPORT_DIR$</importFolder> <fileNameProductDateTimePattern>yyyyMMdd'.nc'</fileNameProductDateTimePattern> </exportProduct> </activities> </exportProducts> </exportArchiveModule> |
Archiving to Archive Database
Archiving observed data to the archive database
...