You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

This documentation constitutes a general overview of how to setup and Condor grid in Delft-Fews - for more detailed information please contact Fews product management

Introduction

The calculation of probabilistic outputs in operational flood forecasting often requires that models are run multiple times with varied boundary condition or model structure. In order to reduce the computation time ensembles can be run in parallel.

In order to perform these operations grid computing techiques can be utilized. Condor is a specialized workload management system for computationally intensive jobs. Condor provides the necessary tools such as job queuing, scheduling, priority management, resource monitoring, and resource management to enable multiple model runs to be made on multiple Delft-Fews forecasting shell machines. Serial or parallel jobs can be submitted to Condor which are then placed into a queue, and run based on how condor is configured.

With this type of grid computing there is a normally an overhead (running 50 ensembles in parallel does not mean it will be 50 times quicker). The computational overhead is dependant on the amount of static and dynamic data which must be transferred between the forecasting shell and the node. This can be minimized by preloading the nodes with static data and using compressed or efficient data forms for dynamic data.

Setting up Condor in the general adapter

In the general section of the generalAdapterRun the number of ensembles should be specified:

<ensembleMemberCount>17</ensembleMemberCount>

The tag %ENSEMBLE_MEMBER_ID% can then be used to loop through the ensemble output directories.

<?xml version="1.0" encoding="UTF-8"?>
<!--Delft FEWS PO-->
<generalAdapterRun xmlns="http://www.wldelft.nl/fews" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.wldelft.nl/fews http://fews.wldelft.nl/schemas/version1.0/generalAdapterRun.xsd">
	<!-- General information for General Adapter run -->
	<general>
		<description>Sobek Model run for the Como Lake</description>
		<rootDir>%REGION_HOME%/Modules/SbkParallel/como</rootDir>
		<workDir>%ROOT_DIR%</workDir>
		<exportDir>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_in</exportDir>
		<exportIdMap>IdSobekExportForecast</exportIdMap>
		<importDir>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_out</importDir>
		<importIdMap>IdSobekImportForecast</importIdMap>
		<dumpFileDir>$GA_DUMPFILEDIR$</dumpFileDir>
		<dumpDir>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%</dumpDir>
		<diagnosticFile>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_out/diagnostic.xml</diagnosticFile>
		<convertDatum>true</convertDatum>
		<ensembleMemberCount>17</ensembleMemberCount>
	</general>
	<activities>
		<startUpActivities>
			<purgeActivity>
				<filter>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_in/*.*</filter>
			</purgeActivity>
			<purgeActivity>
				<filter>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_out/*.*</filter>
			</purgeActivity>
			<purgeActivity>
				<filter>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/model/*.*</filter>
			</purgeActivity>
			<purgeActivity>
				<filter>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/log/*.*.*</filter>
			</purgeActivity>
		</startUpActivities>
		<!-- Export activities -->
		<exportActivities>
			<!-- Export state (warm state)-->
			<exportStateActivity>
				<moduleInstanceId>Sobek_Po_Como_Historical</moduleInstanceId>
				<stateExportDir>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/model</stateExportDir>
				<stateConfigFile>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/fews_in/states.xml</stateConfigFile>
				<stateLocations type="file">
					<stateLocation>
						<readLocation>sobek.rda</readLocation>
						<writeLocation>sobek.nda</writeLocation>
					</stateLocation>
					<stateLocation>
						<readLocation>sobek.rdf</readLocation>
						<writeLocation>sobek.ndf</writeLocation>
					</stateLocation>
				</stateLocations>
				<stateSelection>
					<warmState>
						<stateSearchPeriod unit="hour" start="-48" end="0"/>
					</warmState>
				</stateSelection>
			</exportStateActivity>
			<!-- Export time series -->
			<exportTimeSeriesActivity>
				<description>Export inflows for Sobek Como Lake model</description>
				<exportFile>Input_Como_Pi.xml</exportFile>
				<timeSeriesSets>
					<timeSeriesSet>
						<moduleInstanceId>Sobek_MergeInput_Forecast_COSMO</moduleInstanceId>
						<valueType>scalar</valueType>
						<parameterId>Q.simulated.forecast</parameterId>
						<locationId>Adda23635</locationId>
						<timeSeriesType>simulated forecasting</timeSeriesType>
						<timeStep unit="hour" multiplier="1"/>
						<relativeViewPeriod unit="hour" endOverrulable="true" end="72"/>
						<readWriteMode>read only</readWriteMode>
						<ensembleId>CosmoLeps</ensembleId>
					</timeSeriesSet>
					<timeSeriesSet>
						<moduleInstanceId>Interpolation_Sobek_Forecast</moduleInstanceId>
						<valueType>scalar</valueType>
						<parameterId>H.forecast.external</parameterId>
						<locationId>I-203000</locationId>
						<timeSeriesType>external forecasting</timeSeriesType>
						<timeStep unit="hour" multiplier="1"/>
						<relativeViewPeriod unit="hour" endOverrulable="true" end="72"/>
						<readWriteMode>read only</readWriteMode>
					</timeSeriesSet>
				</timeSeriesSets>
			</exportTimeSeriesActivity>
		</exportActivities>
		<!-- Export activities -->
		<!-- Execute activities:Run SOBEK Adapter, Batch tool -->
		<executeActivities>
			<executeActivity>
				<command>
					<className>nl.wldelft.fews.adapter.sobek.PreSobekModelAdapter</className>
				</command>
				<arguments>
					<argument>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%</argument>
					<argument>../Config/sobekConfig.xml</argument>
				</arguments>
				<timeOut>500000</timeOut>
				<overrulingDiagnosticFile>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/diagnostics/presobekmodeladapter.xml</overrulingDiagnosticFile>
			</executeActivity>
			<!-- Run Condor -->
			<executeActivity>
				<description>Condor Batch script by Frederik</description>
				<command>
					<executable>%ROOT_DIR%\run_condor_sobek.bat</executable>
				</command>
				<arguments>
					<argument>-o</argument>
					<argument>\\$CONDOR_REMOTE_DIR$\SbkParallel</argument>
					<argument>-n</argument>
					<argument>17</argument>
					<argument>-t</argument>
					<argument>18000000</argument>
					<argument>-d</argument>
					<argument>\\$CONDOR_REMOTE_DIR$\SbkParallel</argument>
				</arguments>
				<!-- timeout in milliseconds: 30min x 60sec -->
				<timeOut>19000000</timeOut>
				<ignoreDiagnostics>true</ignoreDiagnostics>
			</executeActivity>
			<executeActivity>
				<command>
					<className>nl.wldelft.fews.adapter.sobek.PostSobekModelAdapter</className>
				</command>
				<arguments>
					<argument>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%</argument>
					<argument>../Config/sobekConfig.xml</argument>
				</arguments>
				<timeOut>500000</timeOut>
				<overrulingDiagnosticFile>%ROOT_DIR%/%ENSEMBLE_MEMBER_ID%/diagnostics/postsobekmodeladapter.xml</overrulingDiagnosticFile>
			</executeActivity>
		</executeActivities>
		<importActivities>
			<!-- Import Sobek results-->
			<importTimeSeriesActivity>
				<description>Import XML file</description>
				<importFile>reachseg.xml</importFile>
				<timeSeriesSets>
					<timeSeriesSet>
						<moduleInstanceId>Sobek_Po_Como_Forecast_COSMO_Parallel</moduleInstanceId>
						<valueType>scalar</valueType>
						<parameterId>Q.simulated.forecast</parameterId>
						<locationId>Serbatoio_Como235</locationId>
						<timeSeriesType>simulated forecasting</timeSeriesType>
						<timeStep unit="hour" multiplier="1"/>
						<readWriteMode>add originals</readWriteMode>
						<expiryTime unit="day" multiplier="2"/>
						<ensembleId>CosmoLeps</ensembleId>
					</timeSeriesSet>
				</timeSeriesSets>
			</importTimeSeriesActivity>
		</importActivities>
	</activities>
</generalAdapterRun>

The output directories must be created in the modules directory to receive the correct data for the ensemble run. I.e. in this case data is exported to the directories (see <exportDir> tag above):

Modules\SbkParallel\Como\0\fews_in\
Modules\SbkParallel\Como\1\fews_in\
Modules\SbkParallel\Como\2\fews_in\
...
Modules\SbkParallel\Como\17\fews_in\

Execution of the Condor batch script

From the example above we see that the general adapter executes a batch script with a number of arguments

			<!-- Run Condor -->
			<executeActivity>
				<description>Condor Batch script by Frederik</description>
				<command>
					<executable>%ROOT_DIR%\run_condor_sobek.bat</executable>
				</command>
				<arguments>
					<argument>-o</argument>
					<argument>\\$CONDOR_REMOTE_DIR$\SbkParallel</argument>
					<argument>-n</argument>
					<argument>17</argument>
					<argument>-t</argument>
					<argument>18000000</argument>
					<argument>-d</argument>
					<argument>\\$CONDOR_REMOTE_DIR$\SbkParallel</argument>
				</arguments>
				<!-- timeout in milliseconds: 30min x 60sec -->
				<timeOut>19000000</timeOut>
				<ignoreDiagnostics>true</ignoreDiagnostics>
			</executeActivity>

The batch script then executes a shell script examples of which can be found attached.

  • No labels