Delft-FEWS and the cloud

Delft-FEWS Software: A cloud agnostic approach

Delft-FEWS components are being deployed on many different architectures and hardware. A considerable amount of Delft-FEWS users use an IT infrastructure with virtual machines. The usual goal of virtualization is to centralize administrative tasks while improving scalability and overall hardware-resource utilization. When organizations are in the initial stage of (re-)defining their IT infrastructure, it is commonly recognized that after virtualization, containerization is the next logical step in the evolution of IT infrastructure. It will remain possible to install Delft-FEWS on on-premise hardware, or in virtual machines. Delft-FEWS system installation on regular hardware / VMS is currently done by organizing a central database, installing RPMs / MSIs / unzipping the binaries, setting OS environment variables and starting a launcher service. For installation in Kubernetes this is not going to be much different. Usually this is controlled using data driven yaml / json configuration files to apply the needed actions.

component	cloud readiness status	Room for improvements
Database	Both db docker containers as well as managed instances are already possible. Managed instances require minor adjustments of the db scripts.	Support one set of database scripts for all db flavors managed and unmanaged.
Master Controller	Yes	Enable service replication
Admin Interface	Yes
Operator Client / SA	Use Azure Virtual Desktop or Database proxy
Config Manager	Use Azure Virtual Desktop or Database proxy or API
Forecasting Shell Server	Yes	Facilitate auto scaling.
WebServices	Yes
DatabaseProxy	Yes
OpenArchive	Yes

Delft-FEWS in the cloud

Deltares will improve the Delft-FEWS components for use in containers and provide guidance on the installation. This guidance shall be focused on Kubernetes because in our view Kubernetes is the most commonly accepted and best supported cloud computing solution.

Delft-FEWS Hardware and software requirements

The Delft-FEWS Hardware and software requirements for on-premise hardware / VMs also apply to deployment in the cloud. We recommend all containers to be linux unless Windows containers are specifically required. For Windows containers HW virtualization is required.

Single MC / Dual MC

A dual Master Controller setup provides redundancy at the cost of additional compute resources. This concept is also recommended in the cloud when high availability is required.

Installation of Operator Clients

non-exhaustive list of options	remarks
database http proxy using SSL
Azure Virtual Desktop	only in Azure. See also: Azure Virtual Desktop for the Operator Client
ssh + mobaxterm
Citrix	Can be integrated with most cloud providers
Apache Guacamole

Use of managed services

There is no actual requirement for the Delft-FEWS components to use managed services. Managed services can be used as long performance is not affected. As an example, customers that are using SQLServer database replication between different geographical locations reported database timeouts. In response, we've improved our software, removed database indexes and added reconnection strategy for these problems. Since we expect Delft-FEWS users add many more simultaneous running Forecasting Shell servers in the future, we expect / foresee more challenges in this area. It is much better not to use the automated placement of database indexes.

How to deal with (incoming, outgoing) data feeds

for file-based imports, use Network File Service (NFS) or Windows shares.
for server imports serving public data, ftp / http can be used (encryption would provide unnecessary overhead), other services in need of passwords should should use a secure connection / https

Kubernetes

Kubernetes uses Docker containers. A container is a "lightweight" abstraction layer on top of the host operating system. Multiple containers share the machine’s operating system kernel and do not require the overhead of associating an operating system within each application. In comparison with VMs, containers bring reduced start-up time, more compute capacity, more flexibility, fault isolation, ease of management, simplified security and reduced costs. The operational benefits for Delft-FEWS systems are also in line with the Roadmap plans for automation of installations with less needless customization, better auto-scaling and more flexible testing. We prefer using linux containers as much as possible. Whether linux containers can be used may depend on the requirements of the forecast model. Any Windows-based forecast models can be separately run on Windows hardware, Windows VMs (or in a Windows docker container).

Important cost variables

An estimate for the cost for a basic/medium-sized Delft-FEWS system in the cloud would be around 12k€- 15k€. Many cloud providers offer a "cloud calculator" to calculate, upfront, the expected cost, e.g. Azure calculator. The estimate differs per Delft-FEWS system.

Data size and egress

Egress, i.e. data traffic transferred from the cloud environment, is not free. The egress costs depend on the type of connection and on the amount of egress data. Ingress, e.g. data traffic into the cloud environment is often free of cost. Egress can be, partly, configured in the Delft-FEWS. Also the location of the Operator Client (in the cloud or on-premise) effects the egress. See How to calculate the right Azure outbound capacity and choose the best egress option.

Type and number of machines

The price difference between for example a 32GB virtual machine and two 16GB virtual machines (VM) is quite the same. However, depending on your requirements, each separate VM requires additional and individual services like for example back-up and security.

Data storage

Data can be stored in different cloud solutions each with it's own price and functionalities. Depending on your requirements you can add managed disks to your machine or a dedicated storage solution like Azure files or Blob storage solutions.

Managed versus unmanaged

Cloud providers offer managed services like for example a managed database. The price for a managed solution is higher than an unmanaged solution. However, a managed solution requires a lot less of your own IT-staffing hours regarding managing the system.

High Availability and scaling

Introduction

When talking about high availability and scaling, the following concepts are important:

Scale horizontally
- Scale out: add more virtual machines or containers of the same component
- Scale in: remove virtual machines or containers of the same component
Scale vertically
- Scale up: increase the resources of a virtual machine or container like CPU or RAM
- Scale down: decrease the resources of a virtual machine or container like CPU or RAM

Scaling up of Azure components can be done manually. Some possibilities are:

Create a new Virtual machine by restoring an existing backup.
Use a Azure Automation Runbook in combination with DSC to install the Deflt-FEWS software
Use a devops solution tor creating a new virtual machine by deploying ARM templates of the components.

For each Delft-FEWS component the scaling capabilities are discussed. Delft-FEWS does not auto scale components itself. If required, auto scaling must be managed by the Azure environment. Azure has autoscaling features like VM Scale sets (which require a custom VM image), App Services and Kubernetes to support autoscaling.

Availability Set

In Azure all Virtual machines can be deployed as part of an Availability Set. This will assure that VMs that are part of the same Availability Set won’t be affected at the same time by Azure maintenance, windows updates etc.

Availability and scaling the Master Controller

There can only be one Master Controller running at a time and is a single point of failure. This can only be avoided by deploying a dual MC system.
As a result of this, the Master Controller cannot be scaled horizontally. Since the Master Controller is a single point of failure, it is important to monitor the health of the Master Controller.
Scaling the Master Controller vertically can be done by redeploying the Master Controller.

Availability and scaling the Forecasting Shell Servers

Forecasting Shell Servers can be scaled up both horizontally and vertically.
Scale horizontally: deploying a new Forecasting Shell Server virtual machine for an FSS Group that needs more forecasting shells.
Scale vertically: update an existing Forecasting Shell Server Virtual Machine with more resources. To allow a Forecasting Shell to use more memory, the Delft-FEWS FSS client config file can be customized. It is also possible to run multiple Forecasting Shells on the same Virtual machine.
Deploying multiple Forecasting Shell Servers will avoid a single point of failure.

Availability and scaling the Admin Interface

The Admin Interface can be scaled up both horizontally and vertically.
Scale horizontally: deploying a new Admin Interface virtual machine. A load balancer or application gateway is used in front of the Admin Interface and new VMs must be registered with the loadbalancer.
Scale vertically: update an existing Admin Interface Virtual Machine with more resources.
Deploying multiple Admin Interfaces will avoid a single point of failure.

Availability and scaling the Web Services

The Web Services can be scaled up both horizontally and vertically.
Scale horizontally: deploying a new Web Services ARM template. A load balancer or application gateway is used in front of the Web Services and new VMs must be registered with the loadbalancer.
Scale vertically: Redeploy the Web Services Virtual Machine with more resources.
Deploying multiple Web Services will avoid a single point of failure.

Availability and scaling the Archive Server

The Archive Server can be scaled up both horizontally and vertically, but more manual actions are required.
Scale horizontally: Deploy a new Archive Server Virtual Machine. A load balancer or application gateway is used in front of the Archive and new VMs must be registered with the loadbalancer.
The scheduling of the harvester must be duplicated to the new VM. Since the harvester builds the index it takes some time before a new archive server is in sync. The load balancer should use sticky sessions to make sure consistent results are given all the time for the same user.
Scale vertically: Redeploy the Archive Server Virtual Machine with more resources.
Deploying multiple archive servers will avoid a single point of failure.

Disaster Recovery

This section will describe possible Disaster Recovery solutions for the different parts of a Virtual Machine based Azure system:

• Azure Managed Database
• Virtual machines
• Azure Storage Accounts

Azure Managed Database Disaster Recovery

Azure Managed databases are backed up. Make sure database backups are stored using geo redundancy. In case of a disaster, the database needs to be restored to another data center in the same region or to another region in case the whole region is affected.
Since restoring a backup to another environment changes the database hostname, this impacts all Delft-FEWS components that refer to the database. This will impact both the server components and the Operator Client.

Virtual machines disaster recovery

Azure backup can be used to restore a Virtual Machines that are backed up using a storage account that support geo-redundant storage (GRS).

Single VM recovery

In case a single VM needs to be recovered the VM image will be restored using the latest functional backup.
For each of the Delft-FEWS components, the following post recovery actions are required:

Master Controller: no actions
Admin Interface: no actions
Forecasting Shell Server: no actions
Web Services: no actions
Archive Server:
- Archive Configuration files that were uploaded after the backup, need to be uploaded again.

- The archive harvester should be run to make sure the indexes are up-to-date. This is done in the Archive server web application.

VM recovery with new database

In case of a disaster where the database and VMs have to be moved to a new data center or region, the VMs can be recovered in 2 ways:

Restore Virtual Machine from backup

After the Virtual Machines have been restored to a new location, database specific configurations will have to be adjusted on the virtual machines. This is a manual process. The specific requirements can be found in the Delft-FEWS System Administration Guide.
On a high level, the following changes need to be performed for the Delft-FEWS components:

Master Controller: update the ENV variables with the changed database connection.
Admin Interface: update the ENV variables with the changed database connection.
Forecasting Shell Server:

Update the FSS ENV variables with the changed database connection, username and password.
Update the global.properties of the FSS in the Delft-FEWS configuration and upload them with the Config Manager. Typical changes are: URL to the archive server, location of the Azure File Shares for imports and the archive.

Web Services: Update the EVN variables with the changed database connection.
Archive Server: update the location of the Azure File Share in the archive configuration file.

performance

Security

For Delft-FEWS in the cloud the same principles apply for security as on premise: Security - Shared responsibility model for Delft-FEWS system installations. Securing your cloud assets requires continuous investment in keeping your containers safe. An infamous example of malconfigured Kubernetes has been Tesla's unsecured admin console for a Kubernetes cluster. This led to malicious actors getting hold of credentials for Tesla's wider AWS environment who used it for cryptomining. Tesla highlighted that it was a test instance "only", but this incident shows why it's really important to secure both production and pre-production resources as far as possible.

do not use insecure keys
do not inappropriately open network configuration on test instances because they are "just" test instances.

Bottom line is to ensure / check any Kubernetes instances you manage are appropriately secured. Use of cloud managed Kubernetes platforms (AKS, EKS, GKE) will generally make this easier and give you more confidence compared to situations where you have to run your own cluster, as the cloud provider will take care of many aspects of configuration. But regardless, be aware that running a Kubernetes cluster well and securely is a big undertaking that requires serious, proactive and ongoing effort to keep things secure and maintained.

Best practices

Examples

Deltares has successfully completed Delft-FEWS projects in the cloud with virtual machines using standard installation scripts, using virtual machines with Azure ARM templates and AWS Elastic Beanstalk. For practical reasons, will keep our requirements / installation instructions as cloud neutral as possible.

Example ARM templates have been provided by MDBA and can be found here: MDBA ARM templates download

Getting started with the Cloud

Deltares has done several migrations and implementations of Delft-FEWS in the cloud. Microsoft Azure is the most popular provider among the community but Delft-FEWS will run in any cloud-environment.
Based on our experience with successful migration and implementations like MDBA (link) we drafted a "how to get started" bullet list.

Make sure your IT solution provider is involved from the beginning of the project.
Train / recruite staff / organisation so that there is a good understanding and knowledge of the specific cloud you want to host your system in. Mapping the functional requirements of the possible cloud solutions can be done much faster with people skilled in the cloud domain. A good example is MDBA, they have a high level of knowledge of both the Delft-FEWS systems as well as the new technologies offered by cloud solutions.
Create a list of requirements. Both functional and technical. Also incorporate requirements like performance, uptime, disaster recovery, high availability etc. Make sure that you also are aware of your company rules regarding using and migrating to the cloud.
Check which forecast models need to be run and if these can be run in the cloud (and, if applicable, under which licences)
Organise a couple of workshops with Deltares (or another partner) to map the requirements of the cloud solutions.
Create an implementation or migration plan.
Implement a dry run phase. In this phase, the whole system is up and running but not for operational use. During this phase, the users can use the system like an operational system to test whether everything is functioning as expected.

Deltares contacts

For more info contact Delft-FEWS product management.

Page tree