Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Use Azure Virtual Desktop or Database proxy or API
componentcloud readiness statusRoom for improvements
DatabaseBoth db docker containers as well as managed instances are already possible. Managed instances require minor adjustments of the db scripts.Support one set of database scripts for all db flavors managed and unmanaged.

Master Controller

Yes

Enable service replication

Admin Interface

Yes


Operator Client / SA

Use Database proxy (Azure: Azure Virtual Desktop or Database proxy

Config Manager

)


ConfigManagerSee Operator Client, in addition the AdminInterface API can be used.

Forecasting Shell Server

Yes

Facilitate auto scaling.

WebServices

Yes


DatabaseProxy

Yes


OpenArchive

Yes

...


Autoscaling

Deltareswill improve the Delft-FEWS components for use in containers and provide guidance on the installation. This guidance shall be focused on Kubernetes Autoscaling we intend to implement directly using the Kubernetes API because in our view Kubernetes is the most commonly accepted and best supported cloud computing solution.

...

A dual Master Controller setup provides redundancy at the cost of additional compute resources. This concept is also recommended in the cloud when high availability is required.

Reference Architecture

The following diagram is a typical architecture that is used in the Azure cloud when deploying a Single MC system using traditional virtual machines.

Image Removed

Installation of Operator Clients

Installation of Operator Clients

non-exhaustive list of optionsremarks
database http proxy using SSL
Azure: Azure Virtual Desktoponly in Azure. See also: Azure Virtual Desktop for the Operator Client
ssh + mobaxterm
CitrixCan be integrated with most cloud providers
Apache Guacamole

...

  1. for file-based imports, use Network File Service (NFS) or Windows shares (On Azure typically Azure Files is used).
  2. for server imports serving public data, ftp / http can be used (encryption would provide unnecessary overhead), other services in need of passwords should should use a secure connection / https

...

An estimate for the cost for a basic/medium-sized Delft-FEWS system in the cloud would be around 12k€- 15k€. Many cloud providers offer a "cloud calculator" to calculate, upfront, the expected cost, e.g. Azure calculator. The estimate differs per Delft-FEWS system.

...

Egress, i.e. data traffic transferred from the cloud environment, is not free. The egress costs depend on the type of connection and on the amount of egress data. Ingress, e.g. data traffic into the cloud environment is often free of cost. Egress can be, partly, configured in the Delft-FEWS. Also the location of the Operator Client (in the cloud or on-premise) effects the egress. See How to calculate the right Azure outbound capacity and choose the best egress option 

Type and number of machines

...

Data can be stored in different cloud solutions each with it's own price and functionalities. Depending on your requirements you can add managed disks to your machine or a dedicated storage solution like Azure files or Blob storage solutions.

Managed versus unmanaged

...

  • Scale horizontally
    • Scale out: add more virtual machines or containers of the same component
    • Scale in: remove virtual machines or containers of the same component
  • Scale vertically
    • Scale up: increase the resources of a virtual machine or container like CPU or RAM
    • Scale down: decrease the resources of a virtual machine or container like CPU or RAM

Scaling up of Azure components can be done manually. Some possibilities are:

  1. Create a new Virtual machine by restoring an existing backup.
  2. Use a Azure Automation Runbook in combination with Desired State Configuration (DSC) to install the Deflt-FEWS software
  3. Use a devOps solution tor creating a new virtual machine by deploying ARM templates of the components.

For each Delft-FEWS component the scaling capabilities are discussed. Delft-FEWS does not auto scale components itself. If required, auto scaling must be managed by the Azure environment. Azure has autoscaling features like VM Scale sets (which require a custom VM image), App Services and Kubernetes to support autoscaling.

Availability Set

In Azure all Virtual machines can be deployed as part of an Availability Set. This will assure that VMs that are part of the same Availability Set won’t be affected at the same time by Azure maintenance, windows updates etc.

Master Controller

There can only be one Master Controller running at a time and is a single point of failure. This can only be avoided by deploying a dual MC system. Probably starting with Delft-FEWS 2022.02 multiple Master Controllers can be run on one database. As a result of this, the Master Controller cannot be scaled horizontally. Since the Master Controller is a single point of failure, it is important to monitor the health of the Master Controller. Scaling the Master Controller vertically can be done by redeploying the Master Controller.

Forecasting Shell Servers

Master Controller

There can only be one Master Controller running at a time and is a single point of failure. This can only be avoided by deploying a dual MC system. Probably starting with Delft-FEWS 2022.02 multiple Master Controllers can be run on one database. As a result of this, the Master Controller cannot be scaled horizontally. Since the Master Controller is a single point of failure, it is important to monitor the health of the Master Controller. Scaling the Master Controller vertically can be done by redeploying the Master Controller.

Forecasting Shell Servers

Forecasting Shell Servers can be scaled up both horizontally and vertically. Deploying multiple Forecasting Shell Servers will avoid a single point of failure.

  • Scale horizontally: deploying a new Forecasting Shell Server virtual machine for an FSS Group that needs more forecasting shells.
  • Scale vertically: update an existing Forecasting Shell Server Virtual Machine with more resources. To allow a Forecasting Shell to use more memory, the Delft-FEWS FSS client config file can be customized. It is also possible to run multiple Forecasting Shells on the same Virtual machine.

Admin Interface

The Admin Interface can be scaled up both horizontally and vertically. Deploying multiple Admin Interfaces will avoid a single point of failure.

  • Scale horizontally: deploying a new Admin Interface virtual machine. A load balancer or application gateway is used in front of the Admin Interface and new VMs must be registered with the loadbalancer.
  • Scale vertically: update an existing Admin Interface Virtual Machine with more resources.

Web Services

The Web Services Forecasting Shell Servers can be scaled up both horizontally and vertically.  Deploying multiple Forecasting Shell Servers Deploying multiple Web Services will avoid a single point of failure.

  • Scale horizontally: deploying a new Web Services ARM template. A load balancer or application gateway is used in front of the Web Services and new VMs must be registered with the loadbalancer.
  • Scale vertically: Redeploy the Web Services Forecasting Shell Server virtual machine for an FSS Group that needs more forecasting shells.Scale vertically: update an existing Forecasting Shell Server Virtual Machine with more resources. To allow a Forecasting Shell to use more memory, the Delft-FEWS FSS client config file can be customized. It is also possible to run multiple Forecasting Shells on the same Virtual machine.

Admin Interface

Archive Server

The Archive Server can be scaled up both horizontally and vertically, but more manual actions are required. Deploying multiple archive servers The Admin Interface can be scaled up both horizontally and vertically. Deploying multiple Admin Interfaces will avoid a single point of failure.

  • Scale horizontally: deploying Deploy a new Admin Interface virtual machineArchive Server Virtual Machine. A load balancer or application gateway is used in front of the Admin Interface Archive and new VMs must be registered with the loadbalancer.
  • Scale vertically: update an existing Admin Interface Virtual Machine with more resources.

Web Services

The Web Services can be scaled up both horizontally and vertically. Deploying multiple Web Services will avoid a single point of failure.

  • Scale horizontally: deploying a new Web Services ARM template. A load balancer or application gateway is used in front of the Web Services and new VMs must be registered with the loadbalancer 
    The scheduling of the harvester must be duplicated to the new VM. Since the harvester builds the index it takes some time before a new archive server is in sync. The load balancer should use sticky sessions to make sure consistent results are given all the time for the same user.
  • Scale vertically: Redeploy the Web Services Archive Server Virtual Machine with more resources.

Archive Server

The Archive Server can be scaled up both horizontally and vertically, but more manual actions are required. Deploying multiple archive servers will avoid a single point of failure.

...

High available scenarios

On a high level the different scenarios for running Delft-FEWS in a highly available manner when the primary Azure region fails, from most high available to least available

...

Scenario

...

Description

...

Disaster recovery if primary regions fails

...

Manual interventions

...

Recover Time/Data Loss for Delft-FEWS components

...

Dual MC in 2 Azure regions

...

2 different databases in two regions, databases will be synchronized by Delft-FEWS. Use Geo redundant storage for file shares.

...

Automatically by Delft-FEWS.

...

Operator clients need to reconnect to secondary MC. This has been pre-configured.

...

No Recovery Time. (Hot Stand By)

Data loss is minimal and can be restored by rerunning workflows.

...

Single MC with Database in 2 Azure regions

...

Azure auto failover group Managed Instance Database with Azure Site Recovery for VMS. Use Geo redundant storage for file shares.

...

Use Azure Site Recovery to restore VMS in secondary region.

Synchronized database will be made primary by Azure.

 

...

Azure Site Recovery has to be enabled.

...

Recovery Time depends on the time it takes for Microsoft to enable ASR.

Data loss is minimal and can be restored by rerunning workflows.

...

Single MC with locally redundant Database

...

Single database with Delft-FEWS components

...

Database backups have to be restored to a new database in a secondary region. VM backups have to be restored to a secondary region

...

Database configurations on restored VMs have to be adjusted to new DB.

Delft-FEWS configuration has to be adjusted to use new Database.

...

Recovery time is a few days.

Data loss depends on the backup schedules.

Disaster Recovery

This section will describe possible Disaster Recovery solutions for the different parts of a Virtual Machine based Azure system:

  • Azure Managed Database
  • Virtual machines
  • Azure Storage Accounts

It is also possible to use Azure Site Recovery. This is the disaster recovery solution offered by Azure. Depending on the database solution chosen, it may still be required to do post recovery configurations.

Azure Managed Database Disaster Recovery

...


Single VM recovery

In case a single VM needs to be recovered the VM image will be restored using the latest functional backup.


For each of the Delft-FEWS components, the following post recovery actions are required:

  • Master Controller: no actions
  • Admin Interface: no actions
  • Forecasting Shell Server: no actions
  • Web Services: no actions
  • Archive Server:
    • Archive Configuration files that were uploaded after the backup, need to be uploaded again.
    • The archive harvester should be run to make sure the indexes are up-to-date. This is done in the Archive server web application.

VM recovery with new database

In case of a disaster where the database and VMs have to be moved to a new data center or region, the VMs can be recovered from backup. After the Virtual Machines have been restored to a new location, database specific configurations will have to be adjusted on the virtual machines. This is a manual process. The specific requirements can be found in the Delft-FEWS System Administration Guide. On a high level, the following changes need to be performed for the Delft-FEWS components:

Master Controller:  update the ENV variables with the changed database connection.
Admin Interface:  update the ENV variables with the changed database connection.
Forecasting Shell Server:  

  • Update the FSS ENV variables with the changed database connection, username and password.
  • Update the global.properties of the FSS in the Delft-FEWS configuration and upload them with the Config Manager. Typical changes are: URL to the archive server, location of the storage.

Web Services:  Update the ENV variables with the changed database connection.
Archive Server: update the storage location in the archive configuration file.

Operator Client Recovery

Operator Client with synchronization profile

It is possible to have a synchronizing operator client on-premise. In case the database is no longer available, the Operator Client can still be used with the synchronized data in the local datastore. In case a new database has been installed after a disaster recovery, the Operator Client has to be reconfigured to access the new database.

Direct Database Access Operator Client

In case a new database has been installed after a disaster recovery, the Operator Client has to be reconfigured to access the new database.

Monitoring and Alerting

Event Logs

Delft-FEWS logs all events from workflows in the central database.

Operator Client

The Operator Client provides some access to information on the status of the system components, file imports and workflows.

Admin Interface

The Web Based Delft-FEWS Admin Interface provides a dashboard for the FEWS Administrators to view the status of the Delft-FEWS components and workflows. Errors and events are logged within the central database and log extracts can be downloaded via the browser to provide to Deltares in the event of issues which can't be resolved internally. The Admin Interface also provides a series of APIs to enable access to the events and status information and the audit logs. Audit Logs of user actions are also stored in the central database and the Admin Interface API can be used to access these events.

Virtual machines disaster recovery

Azure backup can be used to restore a Virtual Machines that are backed up using a storage account that support geo-redundant storage (GRS).

Single VM recovery

In case a single VM needs to be recovered the VM image will be restored using the latest functional backup.

...

  • Master Controller: no actions
  • Admin Interface: no actions
  • Forecasting Shell Server: no actions
  • Web Services: no actions
  • Archive Server:
    • Archive Configuration files that were uploaded after the backup, need to be uploaded again.
    • The archive harvester should be run to make sure the indexes are up-to-date. This is done in the Archive server web application.

VM recovery with new database

In case of a disaster where the database and VMs have to be moved to a new data center or region, the VMs can be recovered from backup. After the Virtual Machines have been restored to a new location, database specific configurations will have to be adjusted on the virtual machines. This is a manual process. The specific requirements can be found in the Delft-FEWS System Administration Guide. On a high level, the following changes need to be performed for the Delft-FEWS components:

Master Controller:  update the ENV variables with the changed database connection.
Admin Interface:  update the ENV variables with the changed database connection.
Forecasting Shell Server:  

  • Update the FSS ENV variables with the changed database connection, username and password.
  • Update the global.properties of the FSS in the Delft-FEWS configuration and upload them with the Config Manager. Typical changes are: URL to the archive server, location of the Azure File Shares for imports and the archive.

Web Services:  Update the ENV variables with the changed database connection.
Archive Server: update the location of the Azure File Share in the archive configuration file.

Operator Client Recovery

Operator Client with synchronization profile

It is possible to have a synchronizing operator client on-premise. In case the Azure Database is no longer available, the Operator Client can still be used with the synchronized data in the local datastore. In case a new database has been installed after a disaster recovery, the Operator Client has to be reconfigured to access the new database.

Direct Database Access Operator Client

In case a new database has been installed after a disaster recovery, the Operator Client has to be reconfigured to access the new database.

Azure File Shares Archive recovery

In case of a disaster or a human error with the archived data on an Azure Files Share, Azure backup can be used to restore the archived files. It is recommended to use a geo redundant backup for archived data.

Monitoring and Alerting

Event Logs

Delft-FEWS logs all events from workflows in the central database.

Operator Client

The Operator Client provides some access to information on the status of the system components, file imports and workflows.

Admin Interface

The Web Based Delft-FEWS Admin Interface provides a dashboard for the FEWS Administrators to view the status of the Delft-FEWS components and workflows. Errors and events are logged within the central database and log extracts can be downloaded via the browser to provide to Deltares in the event of issues which can't be resolved internally. The Admin Interface also provides a series of APIs to enable access to the events and status information and the audit logs. Audit Logs of user actions are also stored in the central database and the Admin Interface API can be used to access these events.

Log Analytics

Using the Log Analytics service (part of Azure Monitoring) of Azure allows collecting log events from the different Delft-FEWS components. The Master Controller, Forecasting Shell Server and Admin Interface support sending error log events to the Windows Event Logger or Linux syslog. Log Analytics can be connected to the Windows Event Logger and syslog. This requires installing the MicrosoftMonitoringAgent extension on the virtual machine. To connect other Delft-FEWS component to Log Analytics a custom connector can be defined. All Delft-FEWS components log files to the local file system.

Malware protection

To protect the VMs from malware, it is possible to install the IaaSAntimalware extension on the Virtual Macghine. Enabling malware protection may have a negative impact on the performance of the Delft-FEWS components. It may be required to add some of the Delft-FEWS directories to the exclusion list.

Azure Infrastructure Monitoring

Infrastructure monitoring of the Delft-FEWS Virtual Machines is done with Azure Monitoring Services. This requires installing the IaaSDiagnostics extension on the virtual machine.

Azure Service Bus

The Delft-FEWS System Alerter can be configtured to send events to the Azure Service Bus. This allows triggering external applications (for example an Azure function) based on events in the Delft-FEWS system. For more information, see: Azure Service Bus Alerts

Security

For Delft-FEWS in the cloud the same principles apply for security as on premise: Security - Shared responsibility model for Delft-FEWS system installations. Securing your cloud assets requires continuous investment in keeping your containers safe. An infamous example of malconfigured Kubernetes has been Tesla's unsecured admin console for a Kubernetes cluster.  This led to malicious actors getting hold of credentials for Tesla's wider AWS environment who used it for cryptomining. Tesla highlighted that it was a test instance "only", but this incident shows why it's really important to secure both production and pre-production resources as far as possible. 

...

DevOps

It is possible to use DevOps techniques to automaticlly test and deploy Delf-FEWS configuration changes in the cloud. See the following example on how to use Azure DevOps in combincation with the Workflow Test Runner and the Admin Interface API to automatically deploy a tested configuration:

Workflow Test Runner in Azure Devops

Examples

Deltares has successfully completed Delft-FEWS projects in the cloud with virtual machines using standard installation scripts, using virtual machines with Azure ARM templates and AWS Elastic Beanstalk. For practical reasons, will keep our requirements / installation instructions as cloud neutral as possible.

...

Deltares has done several migrations and implementations of Delft-FEWS in the cloud. Microsoft Azure is the most a popular provider among the community but Delft-FEWS will run in any cloud-environment.
Based on our experience with successful migration and implementations like MDBA (link) we drafted a "how to get started" bullet list.

...