Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Cloud Notary 

 

In our field of hydrodynamic modelling, it is common to have sensitive and/or large and expensive datasets, which are vital to the business and treated as secret. These datasets however should still be accessible to downstream algorithms and (external) models to analyze and improve these models. In other words, Party A has data x and Party B has model y and both do not want to or cannot share their property. However, they are both interested in the result z = operation(x,y)We envision the use of cloud computing as a facilitator for the operation. 

...

An example of such a cooperation is Van Oord and Deltares. Van Oord has Buoy data which could be useful to improve the Global Tide and Surge Model (GTSM) from Deltares. Both parties cannot share their property but are interested in the improved model results. This use case is taken to create an architecture in Microsoft Azure as described below. 

 

Key concepts 

 

In our brainstorming sessions, a few key concepts came up. First, that both parties still need to trust both the data and the algorithm to be valid and to do only what they need to do and nothing else. Therefore, a legal contract should be drawn up. Second, the environment that will hold both the data and the algorithm should be transparent as possible, i.e., it should be clear what it will do beforehand. This is best solved by presenting the infrastructure as code beforehand. Third and last, the envisioned operation should run as a functional account that ideally prevents any access by both parties. 

...

Continuous integration and continuous deployment (CI/CD), normally used to test and deploy code on each commit, is a suitable candidate to provision a secure cloud environment. Using Github and Azure DevOps (both from Microsoft) we can store the environment as code and setup a cloud environment. In the Azure DevOps setup, a service level user is created that will gain access to both the Deltares Azure environment as the van Oord Azure environment, but without either parties seeing those keys or gaining access themselves. 

 Image Removed 

Image Added

Figure 1 Cloud notary architecture in Azure 

 

Architecture 

 

The implementation expects an initial setup at Microsoft Azure for the involved parties. This is illustrated in Figure 1. 

...

This initial setup is the input for an Azure DevOps pipeline. This pipeline creates a Kubernetes cluster with the keys stored in secrets. This Kubernetes cluster is then used to start a workflow which brings data and model together. A more detailed description of the pipeline is given next.  

 

 Image Removed 

Image Added

 

The actions in the pipeline are illustrated in the image below. A service account from both Van Oord and Azure are used in the pipeline for access to the relevant items in the recourse groups as described above. 

...

  • Mount volumes. For the storage accounts Persistent Volumes and Persistent Volume Claims are created to mount these volumes in the containers 

  • In the last step the Argo workflow is submitted to run the Global Tide and Surge Model. 

Image Removed Image Added

 

Operation 

Once the above architecture is setup, any user can trigger the workflow by making changes to the code, for example to point to a new input path or timeframe. The CI will trigger and run the complete pipeline, both to setup the architecture as to run the workflow. Larger changes to the infrastructure can be reviewed in a merge request like manner. 

...

The last step of the pipeline is to generate a comparison between the model and the observational data, in our case between Buoy data and our GTSM model. Such a result is seen in Figure 2. Note that the output has to strike a balance between being detailed enough to improve the model and coarse enough that the original input cannot be retrieved anymore. 

Image Removed Image Added

Figure 2 Output of the complete processing pipeline. The model is overestimating the magnitude of the velocity in this case, especially on lower velocities. 

 

Future work 

The above architecture and usage are a working proof of concept. We must note however that while the use of DevOps as a CI service is useful for the transparent setup of the architecture, it’s less so for actually triggering the pipeline, for which a standalone service should be made. This service can however be provisioned by the above pipeline. 

...