Geographical data and knowledge management (OpenEarth)
Project Phase: all
Purpose: Software programme for the development and execution of models
Requirements: Programming skills
Relevant Software: OpenEarth, Python, Matlab, R
OpenEarth (www.openearth.nl) is a free and open source initiative to deal with data, models and tools in earth science & engineering projects. In current practice, research, consultancy and construction projects commonly spend a significant part of their budget to setup some basic infrastructure for data and knowledge management. Most of these efforts disappear again once the project is finished. As an alternative to these ad-hoc approaches, OpenEarth aims for a more continuous and cumulative approach to data & knowledge management. OpenEarth promotes, teaches and sustains project collaboration and skill sharing. As a result, research and consultancy projects no longer need to waste valuable resources by repeatedly starting from scratch. Rather they can build on the preserved efforts from countless projects before them.
Over many years Delft University of Technology and Deltares, together developed OpenEarth as a free and open source alternative to the project-by-project and institution-by-institution approaches to deal with data, models and tools (e.g Van Koningsveld et al. 2004). OpenEarth at its most abstract level represents the philosophy that data, models and tools should flow as freely and openly as possible across the artificial boundaries of projects and organizations (or at least departments). Put in practice OpenEarth exists only because there is a robust user community that works according to this philosophy (a bottom up approach). In its most concrete and operational form OpenEarth facilitates collaboration within its user community by providing an open ICT infrastructure, built from the best available open source components, in combination with a well-defined workflow, described in open protocols based as much as possible on widely accepted international standards.
OpenEarth as a whole (philosophy, user community, infrastructure and workflow) is the first comprehensive approach to handling data, models and tools that actually works in practice at a truly significant scale. It is implemented not only at its originally founding organizations Delft University of Technology and Deltares but also in a number of research programs with multiple partners. As a result OpenEarth is now carried by a rapidly growing user community that currently consists of more than 1000 users, of which over 100 developers made over 10.000 contributions. This community originates from tens of organizations from multiple countries. Together they share and co-develop thousands of tools, tera-bytes of data and numerous models (source code as well as model schematisations).
The OpenEarth infrastructure and workflow
Improper management of data, models and tools can easily result in a wide range of very recognisable frustrations:
- Accidentally using older versions of data, models or tools
- Not knowing where the most recent version is and what its status is
- Making the same mistake twice due to lack of control over versions
- Losing important datasets that are extremely hard to replace
- Uncertainty as to what quantities have been measured and which units apply
- Uncertainty as to the geographical location of measurements
- Uncertainty as to the time and time zone the measurements were taken in
- Lack of insight in the approach taken and the methods used
- A wide range of formats of incoming (raw) data
- Getting the feeling that a certain issue must have been addressed before by another analyst
- Running into a multitude of tools for the same thing
- Running into a multitude of databases each in its own language and style
Although the above-described frustrations are very common throughout the hydraulic engineering industry, no practical and widely accepted remedy used to be available. Since 2003, OpenEarth is developed to fill this gap providing an infrastructure to support a bottom-up approach for long-term project-transcending collaboration adhering to four basic criteria:
- Open standards & open source - The infrastructure should be based on open standards, not require nonlibre software and be transferable
- Complete transparency - The collected data should be reproducible, unambiguous and self-descriptive.
Tools and models should be open source, well documented and tested
- Centralised web-access - The collection and dissemination procedure for data, models and tools should be web-based and centralised to maximize access, promote collaboration and impede divergence.
- Clear ownership and responsibility – Although data, models and tools are collected and disseminated via the OpenEarth infrastructure, the responsibility for the quality of each product remains at its original owner.
The heart of the OpenEarth infrastructure is formed by three kinds of web-services (see Figure):
- a web service for version control and back-up of raw data, model schematisations and computer code.
- a web service for accessing data numbers, and
- a web service for accessing data graphics,
Once data, models and tools are stored according to the OpenEarth principles, it becomes quite straightforward to develop specific tools that address a particular user need. Examples of such applications developed within Building with Nature are:
- Visualisation of open-source data (OpenEarth-Viewer)
- Interactive Dredge Planning Tool - Singapore
- Interactive group modelling (MapTable)
OpenEarth is useful to various actors in different roles in the science and engineering context. Here we discuss the required usage skills for:
- project engineers/hands-on researchers,
- project leaders/principal investigators and program managers.
Project engineers/hands-on researchers
OpenEarth is primarily meant for those project engineers/hands-on researchers who analyse data and model outputs on a regular basis.
Data examples are elevation recordings and bathymetry soundings, counts of species, remote sensing imagery and in situ measurements with buckets, ADCP, OBS, conductivity sensor, thermometer, waverider. Model examples include Delft3D, Delwaq, SWAN, ASMITA, Unibest and XBeach.
The usage skills required for project engineers/hands-on researchers are that they have to learn about version control and about some international standards for exchange of data over the web. These can simply be incorporated in their existing workflows, so they can keep using the skills that they already have, whether it is Matlab, python, R, ArcGIS. The reason is that OpenEarth is simply a workflow for embedding existing processing into an overall framework for collaboration. OpenEarth selected the most promising standards that have been implemented worldwide for raw data, standard data, tailored data, graphics of data and meta-data of data . For these standards dedicated tutorials have been compiled for the most common general analysis tools in collaboration with other institutes: Matlab (IHE), python (TUD), R (IMARES), Delft3D, OPeNDAP (KNMI, Rijkswaterstaat), WxS (KNMI). In addition, a few times per year free hands-on courses, so called sprint sessions, are organized. The aim is that within one day participants can work with these international standards with their own data, using their own favourite analysis tool. These standards are gradually made compulsory at EU level by the INSPIRE legislation.
Project leaders/principal investigators and program managers
Project leaders/principal investigators and program managers can also adopt OpenEarth in their project. The only skill they need is that when they compose a project team, they manage to finance enough manpower in expertise in standardisation and automation of data and model processing. And that they show enough leadership to enforce use of it. Experts tend to use artisan approaches that are tailored to their specific needs, hampering exchange between different disciplines in projects. In addition, they hardly guarantee provenance of their data. This has allowed many debacles to occur like with the IPCC (Intergovernmental Panel on Climate Change) data or the wide-spread fraud of a Nijmegen professor. The lack of proper control of automation in science has been discussed in a recent Nature article. OpenEarth offers a complete workflow to circumvent these issues. Project leaders can therefore force their project teams to adopt common standards, following the consensus approach of OpenEarth. In a sense project leaders/principal investigators and program managers are therefore the most influential users of OpenEarth, without having to know about the details of the standards.
Building with Nature interest
The Data and Knowledge Management part of BwN adopted the infrastructure and standards made available by OpenEarth. Benefit of joining the OpenEarth initiative is that previously uploaded datasets as well as the hundreds of tools that have already been developed in other projects will become available to the BwN partners (all routines in OpenEarth are open source, distributed under the GNU Lesser General Public License conditions). Furthermore an active international community of practice is already established, facilitating optimal use of available resources. Training in the use of the available tools is given on a regular basis and can easily be extended to the BwN community.
Important for the BwN program is to establish maximum overlap between on-going activities related to data, model and tool storage and dissemination conforming as much as possible to existing standards (adopting successes of previous projects) on the one hand while striving for maximum usability (avoid pitfalls of previous projects) on the other. Data is put under version control, converted to a commonly used data standard (NetCDF for grids or PostgreSQL for biodiversity data) and disseminated through a web-based data server (OPeNDAP). Models are put under version control, stored and disseminated through a repository. Tools, also put under version control, are shared by creating a common toolbox that can be used by all partners.
How to Use
The most important new skill that participants need to acquire is to perform version control on their data, models and tools. For financial, legal and reporting documents, most companies worldwide have already undergone a transition where employees have accustomed themselves with central version control software (e.g. Microsoft sharepoint). For accessing the collection of data models and tools, we chose to adopt the open source and free SubVersion version control system into the OpenEarth standards amalgam. SubVersion is one of the most common version control tools worldwide by professional software engineers. This means that there is plenty of documentation on the web to learn SubVersion. In addition, OpenEarth offers a dedicated tutorial on its wiki and organizes hands on courses couple of times per year – sprint sessions - where participants can learn this new skill in one day.
The other important new skill that participants need to learn are international standards for exchange of data over the web. OpenEarth aims for a consensus process where project engineers from different disciplines agree to collaborate using a suitable common amalgam of proven international standards. In some case international standards are not available yet. Here OpenEarth sets intermediate standards and protocols, awaiting international approval by esteemed standard bodies such as ISO, OGC, the EU INSPIRE directive, EPSG, CF and NASA. At the end of the consensus process, part of the users will have to adapt their data, models and tools to the international standards. Here OpenEarth does require some adaption. But since OpenEarth is only an amalgam of proven, existing standards, this is in principle a no regret transition phase for all participants. They will have to migrate to international standards sooner or later, OpenEarth just speeds-up this transition.
Phased plan process
OpenEarth provides an integral approach for data, models and tools. Currently there are tens of GB of data hosted by OpenEarthTools and nearly 5000 Matlab functions. Whereas initially OpenEarth faced the challenge that not enough data, models and tools were provided, currently we face the challenge that almost too much is available. This 'getting started' introduces the structure behind this overwhelming amount of information. Regardless of whether the content of OpenEarth is complex, the basic 3-step structure in which the content is provided is always simple and straightforward. OpenEarth strives for web services for data, models and tools as shown in the sketch below.
There are basically three kinds of web-services: for graphics, for data and for computer code. For graphics of data and model results OGC kml feeds (aka Google Earth feeds) are provided (3). For published data and model results an OPeNDAP server is provided (2) whereas for computer code (tools) , (unpublished) raw data and model input schematisations a Subversion repository is available (1). The workflow for users that only consume the data, models and tools is 3 > 2 > 1. This 'getting started' document is primarily meant for users that consume the data. In contrast, the workflow for the OpenEarth developers that provide all the data, models and tools is 1 > 2 > 3.
- 3. Google Earth feeds (OGC kml feeds) can be obtained without password restrictions from our kml server. Three types of kml feeds are provided:
- 2. NetCDF data (published data and model results) can be obtained without password restrictions from our OPeNDAP server:
http://opendap.deltares.nl. You can access an OPenDAP server directly with a web browser, or click on deeplinks in a Google Earth overview. NetCDF is an internationally recognized file format, an OGC standard and a NASA standard that can contain an unlimited amount of meta-information. Through OPeNDAP you can access a netCDF file from a OPeNDAP server as if it were on your local computer. This removes the need to download big files, you can simply request a small portion of a file. netCDF/OPeNDAP does the same thing for data what Google Earth does for all the worlds aerial pictures: just leave all data on a server, and download only what you need, when you need it.
- 1. Tools, (unpublished) raw data and model input schematisations can be provided for free albeit with a free username password. The reason for this is that when you want to access these, you are considered a developer, and we grant you write access. To prevent misuse, and to pay proper credit to your contributions, we require all developers to request a free username. This password-controlled repository is located at https://svn.oss.deltares.nl, it uses the same Subversion technique as sourceforge. Everyone can Join OpenEarth as an advanced user or a developer. The OpenEarth checkout includes the latest XBeach software and documentation, see and join XBeach.org to become a member of that community. A manual on the use of SubVersion explains how to get access to the OpenEarthTools. Please also read this migration manual as we recently migrated our repositories to the Deltares open source software portal.
3. Easy visualisation
Powerful Google Earth visualisations, overview of visualisations and overview of OPeNDAP data
2. Access to released data
Direct web access to user-defined selections of Gigabytes of well-structured data.
1. Version control and backup
A growing and improving collection of free tools, tens of Gigabytes of raw data + processing scripts and model schematisations
To demonstrate the potential of the OpenEarth approach we discuss the Holland Coast case. This case develops alternative strategies for the sustainable development of the Dutch coast from Hoek van Holland up to Den Helder, over a timescale of 50 to 100 years. It deals with a range of possible measures, both for sand mining and coastal interventions, addressing (1) the wide range of possible approaches to enhancing the natural potential of a site or design and (2) alternative methods to work with natural processes rather than against them. (Van Koningsveld et al., 2010)
Holland Coast case
The Netherlands is well-known for its advanced coastal protection policies and in association a lot of projects have dealt with this particular stretch of coast before. Invariably each of these projects always dealt with (subsets of) Rijkswaterstaat’s datasets on bathymetry (transects as well as grids), hydrodynamics (waves, currents and waterlevels) and water quality (TSS, etc), TNO’s datasets on topography (the general elevation data of the Netherlands), bathymetry (Dutch continental shelf bathymetry) and soil composition (soil types and grain size distributions), and KNMI’s datasets on meteorology (wind and pressure fields).
All these datasets are now available through OpenEarth in readily accessible uniform netCDF files. The datasets are stored in raw form in the OpenEarthRawdata repository. The same repository also contains the Matlab and Python scripts that can process these data into netCDF files. The raw data have been enriched with meta information and transformed to netCDF. These files are stored on an OPeNDAP server. This server allows everyone to make calculations with these data without the need to download the full data collection that covers over 10 GB.
Finally, for non-specialists that do not need to perform calculations, all data are also presented as readily processed KML feeds for straightforward visualization on Google Earth. Because all this data is already available, the Holland Coast case needs to reserve less of its budget for gathering and processing of historic external datasets. As a result more budget remains for additional data acquisition and/or more detailed (data) analysis and reporting. Just as the Holland Coast case benefits from all this previously gathered data, any future project will benefit from the additional data gathered specifically by this project and new or improved tools for data analysis.
Availability of important basic analysis tools
Besides the use of various datasets a number of important coastal state indicators is associated with the coastal policies that apply to this coastal region.The decision recipe for this policy is based on a volume trend approach applying the Momentane Coastline (MCL) concept. This concept is used in a maintenance indicator comparing the extrapolated expected coastline position (TCL) with a reference value: the 1990 coastline position (BCL) (TAW, 1995; Van Koningsveld and Mulder, 2004).
Since well tested tools associated with important indicators are already available from the OpenEarthTools repository, the Holland Coast case needs to reserve less of its budget for tool development. As a result, more budget remains for thorough application of these tools to the also available datasets. The availability and accessibility of data and tools in this case challenges the analyst to apply his/her analysis routines not to just one transect or a couple of transects near the area of interest, but simply to all transects available in the database. This is an excellent way to (1) see whether or not the routine is robust enough to deal with variations/flaws in the data, and (2) to see whether an observed phenomenon can be detected throughout the dataset or just in a small sub selection of the data. Currently the Building with Nature consortium is executing a field campaign at Vluchtenburg, located in the southern region of the Holland Coast. Data that is collected ranges from Argus video imagery, Aeolian sediment transport, grain size distributions and 3D laser mapping bathymetries to jetski based near shore bathymetries. This data, gathered by various partners, is already routinely stored in the OpenEarthRawData repository and will in the near future become available through OpenEarth for other projects as well.
Various data examples
This is a demonstration of various data visualizations that we have made using a combination of Matlab, the free tools in the OpenEarth toolbox, and the Google Earth TM mapping service. To make your own, just sign up and check out the tutorials for the googleplot toolbox. In an NCK 2012 paper we have shown that plotting data into Google Earth is possible for a wide variety of coastal datasets.
A few of the other *.kml files with which the screen shots below have been made are available in this (temporary) open dir. Note that some of the bigger files might need a bit of time to load, just be patient.
There are more initiatives worldwide that offer data in Google Earth.
- Baart, F.; Van Gelder, P. H. A. J. M.; De Ronde, J.; Van Koningsveld, M. & Wouters, B., 2012. "The effect of the 18.6 year lunar nodal cycle on regional sea level rise estimates." Journal of Coastal Research, 2012
- Ciavola, P.; Ferreir, O.; Haerens, P.; Van Koningsveld, M.; Armaroli, C. & Lequeux, Q. (2011). "Storm impacts along European coastlines. Part 1: The joint effort of the MICORE and ConHaz Projects." Environmental Science & Policy 14(7): pp. 912-923
- Ciavola, P.; Ferreir, O.; Haerens, P.; Van Koningsveld, M. & Armaroli, C. (2011). "Storm impacts along European coastlines. Part 2: lessons learned from the MICORE project." Environmental Science & Policy 14(7): pp. 924-933
- Den Heijer, C., Baart, F. & Van Koningsveld, M., 2011. "Assessment of dune failure along the Dutch coast using a fully probabilistic approach." Geomorphology
- Van Koningsveld, M., G.J. de Boer, F. Baart, T. Damsma, K. den Heijer, P. van Geer and B. de Sonneville, 2010. OpenEarth: inter-company management of data, models, tools and knowledge. Proceedings WODCON XIX conference Beijing, China, 2010
- Boer de, Gerben J.; Baart, Fedor; Bruens, Ankie; Damsma, Thijs; Geer van, Pieter; Grasmeijer, Bart; Heijer den, Kees; Koningsveld van, Mark (2012). OpenEarth: using Google Earth as outreach for NCK’s data. In: NCK-days 2012 : Crossing borders in coastal research., 13 March 2012 - 16 March 2012, Enschede, the Netherlands.
Related Building solutions
Salt Marsh development, Marconi, Delfzijl (in preparation)
- No labels