Since 2024.01 it is possible to configure FSS Groups that support hibernating Azure Virtual Machines that run Forecasting Shell Servers. It is possible to use this option when running all Delft-FEWS Components in the cloud or in a Hybrid mode where only the Forecasting Shell Servers are running in Azure and all other components on Premise.

In this scenario a Delft-FEWS system is running on premise. From this on premise system, one (or more) Azure FSS VMs will be brought up (from their hibernate status)

In the Azure Cloud Environment one or more virtual machines have been created and pre-configured as Forecasting Shell Server (combined in a FSS Group) and have been shutdown into a suspended/hibernate state (in this state only storage costs apply).

Scenario architecture visualization

The architecture for this hybrid scenario is visualized below.

Scenario architecture in more detail

Model input data for the Forecasting Shell Servers that cannot be provided from the database (using an autoExportModuleDataSet) will be synchronized from on premise to the cloud to avoid excessive cost of data transfer.

Writing of model result from the Azure FSS will be done using a Delft-FEWS Database Access Proxy that is hosted on premise. This is only recommended in case the model results don't produce a large amount of data since the data costs of outgoing data can be quite large. A typical use case is a workflow that has a duration of over 15 minutes and doesn't produce a lot of data (<20MB).

In the on premise Delft-FEWS configuration one or more FSS groups are configured to allow starting or stopping the Forecasting Shell Servers. The FSS Groups will be marked as an Azure FSS Group using the master controller configuration. A dedicated azure configuration is available where a Azure FSS group will be configured with the following settings:

fssGroup id
tenantId: Azure Entra ID tenant id.
subscriptionId: id of the Azure Subscription
resourceGroup: Resource Group where the FSS Virtual Machines are deployed.
clientId: client id of the service principal that is allowed to access the Azure API to start/stop virtual machines.
clientSecret: secret of the service principal

When a workflow is mapped to an Azure FSS Group, the process will be as follows:

The Master Controller will authenticate against MS Entry ID using a service principle and will retrieve a access token. The service principle will need to have the correct permissions assigned to allow reading VMs at the resource group level and has to have the correct VirtualMachine permissions for each VM. A custom policy can be configured for this on the subscription level.

MC Policy for Azure Management API

 {
    "properties": {
        "roleName": "Delft-FEWS FSS Scaling",
        "description": "Role that will allow the Delft-FEWS Master Controller to Scale Forecasting Shell Servers VMs in Azure.",
        "assignableScopes": [
            "/subscriptions/xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx/resourceGroups/fews-on-prem-fss-scaling"
        ],
        "permissions": [
            {
                "actions": [
                    "Microsoft.Compute/virtualMachines/start/action",
                    "Microsoft.Compute/virtualMachines/deallocate/action",
                    "Microsoft.Compute/virtualMachines/read",
                    "Microsoft.Compute/virtualMachines/instanceView/read"
                ],
                "notActions": [],
                "dataActions": [],
                "notDataActions": []
            }
        ]
    }
}

A custom role can be created with the following permissions.

This custom role can now be assigned to the app registration that is used by the Master Controller.

Visualization of deploy and workflow sequence

A high level overview of the different steps are shown in the following sequence diagram.

Master Controller Configuration

To use Hibernatable Forecasting Shell Servers the FSS Group needs to be configured. In the Master Contoler Configiguration, azure elements can be added with a fssGroup that the configuration applies to.

<?xml version="1.0" encoding="UTF-8"?>
<mc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wldelft.nl/fews" xsi:schemaLocation="http://www.wldelft.nl/fews https://fewsdocs.deltares.nl/schemas/version1.0/mc.xsd">
    <mcId>fssscalingmc00</mcId>
    <databaseIntId>0</databaseIntId>
    <adminInterface>
		<title>FSS Scaling On Premise</title>
    </adminInterface>
    <azure>
        <fssGroup>linux_on_prem_scalable</fssGroup>
        <tenantId>XXXXXX</tenantId>
        <subscriptionId>XXXXX</subscriptionId>
        <resourceGroup>fews-on-prem-fss-scaling</resourceGroup>
        <clientId>%AZURE_DPC_FSS_SCALING_CLIENT_ID%</clientId>
        <clientSecret>%AZURE_DPC_FSS_SCALING_CLIENT_SECRET%</clientSecret>
    </azure>
    <azure>
        <fssGroup>windows_on_prem_scalable</fssGroup>
        <tenantId>XXXXXXXX</tenantId>
        <subscriptionId>XXXXXX</subscriptionId>
        <resourceGroup>fews-on-prem-fss-scaling</resourceGroup>
        <clientId>%AZURE_DPC_FSS_SCALING_CLIENT_ID%</clientId>
        <clientSecret>%AZURE_DPC_FSS_SCALING_CLIENT_SECRET%</clientSecret>
    </azure>
    <fssGroups>
        <fssGroup id="linux" name="linux" mcId="fssscalingmc00">
            <description></description>
            <allowUnmapped>true</allowUnmapped>
            <minAwakeCount>1</minAwakeCount>
            <releaseSlotsMillis unit="hour" multiplier="0"/>
            <gotoSleepMillis unit="minute" multiplier="5"/>
        </fssGroup>
        <fssGroup id="linux_on_prem_scalable" name="linux_on_prem_scalable" mcId="fssscalingmc00">
            <description></description>
            <allowUnmapped>false</allowUnmapped>
            <minAwakeCount>0</minAwakeCount>
            <releaseSlotsMillis unit="hour" multiplier="0"/>
            <gotoSleepMillis unit="minute" multiplier="5"/>
        </fssGroup>
        <fssGroup id="windows_on_prem_scalable" name="windows_on_prem_scalable" mcId="fssscalingmc00">
            <description></description>
            <allowUnmapped>false</allowUnmapped>
            <minAwakeCount>0</minAwakeCount>
            <releaseSlotsMillis unit="hour" multiplier="0"/>
            <gotoSleepMillis unit="minute" multiplier="5"/>
        </fssGroup>
    </fssGroups>
    <workflowMappings>
        <workflowMapping workflowId="Import_EarthObservation" fssGroupId="windows_on_prem_scalable" mcId="fssscalingmc00"/>
        <workflowMapping workflowId="Import_Forecasts" fssGroupId="linux_on_prem_scalable" mcId="fssscalingmc00"/>
    </workflowMappings>
</mc>

This will allow for using different scaling configuration per FSS Group. A architecture with different Azure Tenants can be supported like this as well.

Master Controller Scalable FSS Implementation

The master controller will request a access token from azure to allow access to the management API. The master controller will use the Azure Management API to get a list of all Virtual Machines in the configured azure resource group.

For each FSS that is part of an Azure FSS Group the list retrieved from the management API is matched with the registered VMs in the FEWS Database. All VMs that have a Virtual Machine name that match an FSS with the same name, will be added to the list of available VMs for scaling. The MC will also update the heartbeat of the FSS entry. In the admin interface the FSS will be displayed as hibernated.

If the Master Controller wants to schedule a task to a scalable FSS Group, it will start an available FSS VM and assign the task to that FSS. When the FSS is completed, the MC will deallocate the FSS VM again if no pending workflow is available for the FSS.

This process is illustrated in the following sequence diagram:

FSS Group maintenance mode

Since a VM is automatically shutdown by the Delft-FEWS Master controller if it is not used, this may conflict with patching schedules of virtual machines. For this reason a new functionality was added to de Admin Interface where it is possible to enable maintenance mode of a forecasting shell server. If maintenance mode is enabled, no tasks will be scheduled to any forecasting shell in the selected FSS Group and the Virtual Machines will not be hibernated.

Page tree

FSS Scaling using Hibernate FSS Groups