{include:ContentHeader}
{section}
{column}

The OpenEarth philosophy aims to collect _and_ disseminate environmental and lab data sets in a project-superseding manner rather than on a project-by-project basis. We believe that science and engineering have become so *data-intensive* that data management is beyond the capabilities of individual researchers. Data management needs to migrate from artisanal methods to *21st century technology*. This implies data management needs to team up with IT-professionals, and vice versa. This belief is wide-spread, and is known as the *4th paradigm*. We recommend to read the [4th paradigm book|http://research.microsoft.com/en-us/collaboration/fourthparadigm/]. It illustrates the spreading belief that all sustainable solutions to manage data should be *web-based* and involve *communities*. OpenEarth aims to be a 4th paradigm workflow solution to let scientist and engineers collaborate in communities over the web. The need for teaming up of science and IT is clearly illustrated in a recent [Nature article|http://www.nature.com/news/2010/101013/full/467775a.html]. At the bottom of this wiki page you can see a movie of the community activity in our raw data repository, a tool we got from the IT-world. Such communities should not only deal with data, but deal with numerical models and analysis tools as well. Data cannot be treated separately from the rest of science. Therefore OpenEarth aims to be an *integral workflow for data, models and tools*. For hosting such a the workflow we advocate collaboration with professional data centres such as [3TU datacentre|http://datacentrum.3tu.nl/], [DANS|http://www.dans.knaw.nl/] and [Pangea|http://www.pangaea.de/]. Some data centers are member of [DataCite|http://datacite.org/], and can give you [DOI|http://www.doi.org/] for published data under conditions, enabling anyone to cite your web-based data.

To be an effective and sustainable 4th paradigm solution, OpenEarth has identified the most promising international standards for exchange of data over the web. These standards come from different realms. These standards are shown in the scheme below. We aim to work with all of these standards, but currently only use the bold ones on a daily basis. These include the [subversion|http://subversion.apache.org/] tool to store not only the *raw data*, but also the processing software (scripts, settings) under version control using the web 2.0 Wikipedia approach: everyone can sign up for write access. This allows us to naturally attribute versions to data, an aspect that lacks in most of today data management solutions (known as provenance). For *standardized data* we use the [netCDF|http://www.unidata.ucar.edu/software/netcdf/] format (NASA and OGC standard). With the [CF|http://cf-pcmdi.llnl.gov/] vocabularies and EPSG codes[^1^|http://www.epsg.org/]^,^[^2^|http://www.epsg-registry.org/] this becomes a very powerful data stack as described in an [OceanObs'09 paper|http://archimer.ifremer.fr/doc/00027/13832/]. We place the netCDF files on a [THREDDS|http://www.unidata.ucar.edu/software/tds/] [OPeNDAP|http://opendap.org/] server for dissemination of TBs of netCDF data over the web. OPeNDAP is available in many user software applications. It is for instance built-in for [Matlab] since 2012, and it is optionally available for the [R|http://cran.r-project.org/web/packages/ncdf4/index.html], [python|http://code.google.com/p/netcdf4-python/], [ArcGis|http://www.asascience.com/software/arcgistools/edc.shtml] and many other netCDF programs[^3^|http://opendap.org/whatClients]^,^[^4^|http://www.unidata.ucar.edu/software/netcdf/software.html].

For ecological data, which have an overwhelming amount of meta-data, we use a plain-vanilla Relational DataBase Manegement System (RDBMS). We chose the powerful, open source [PostgreSQL|http://www.postgresql.org/] implementation with [PostGIS|http://postgis.refractions.net/] spatio-temporal add-on. We are working on adopting dedicated spatio-temporal standard as well. These standards allow for live server-side processing on the data to meet the demands of the user. They deliver *tailored data*. The [OGC consortium|http://www.opengeospatial.org/] is _the_ international body for specifications of these standards. The EU [INSPIRE|http://inspire-geoportal.ec.europa.eu/] directive prescribes these standards. For typical GIS data (flat, 2D or 2.5D) we already work with [postgis|http://postgis.refractions.net/], [geoserver|http://geoserver.org/] and [geonetwork|http://geonetwork-opensource.org/]. However, these so-called {{WxS}} protocols still lack implementation in operational software for many specific demands of time-dependent, 3D, curvi-linear data products in our field. We do not develop {{WxS}} software ourselves, but just wait for the open source implementations, most under [OSGeo|http://www.osgeo.org/] umbrella, to cover the demands of our our field. By far the most promising {{WxS}} client and server implementation we indentified is [ADAGUC|http://adaguc.knmi.nl/] by the Royal Dutch met office KNMI. ADAGUC not only implemented the WCS standard to request data over the web very fast, but also the WMS standard to request imagery. For exchange of *graphics of data*, we chose to start working with the KML standards, the standard behind Google Earth that was also adopted as standard by OGC, but we will adopt WMS as well.
{column}
{column}
| {lozenge:icon=!OpenEarth^lozenges.png!|title=Data standards|link=Data Collection Protocol|color=green|width=300px}OpenEarth data collection protocol {lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=OpenEarthRawData repository|link=https://svn.oss.deltares.nl/listing.php?repname=OpenEarth+Raw+Data|color=green|width=300px}Store your raw data here *(Step 1)*{lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=OpenDAP server (*production*) |link=http://opendap.deltares.nl:8080|color=green|width=300px} Access using OpenDAP protocol [THREDDS|http://opendap.deltares.nl:8080/thredds/catalog/opendap/catalog.html] (_default_). *(Step 2)* {lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=OpenDAP server (*test*)|link=http://dtvirt5.deltares.nl:8080/thredds|color=green|width=300px}Access using OpenDAP protocol THREDDS only{lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=Google Earth|link=KML Screenshots|color=green|width=300px}Data in [Google Earth|http://earth.google.com] ^TM^ *(Step 3)*{lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=GeoServer (*experimental*)|link=http://dtvirt5.deltares.nl:8080/geoserver |color=green|width=300px}Access data using the WMS and WFS services {lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=GeoNetwork (*experimental*)|link=http://geoserver.esx.xtr.deltares.nl:8080/geonetwork/srv/en/main.home |color=green|width=300px}Meta-data with map overview data using the WCS services {lozenge} |
| {lozenge:icon=!OpenEarth^lozenges.png!|title=pydap server (*experimental*)|link=http://dtvirt5.deltares.nl/pydap/ |color=green|width=300px}Meta-data with map overview data using the WCS services {lozenge} |
{column}
{section}

!OpenEarthBuildingBlocks.png|width=800pxpx!

The data collection procedure and the relation between those standards is explained in the [OpenEarth Data Standards|Data Collection Protocol] document, developed in the framework of the EU FP7 Project MICORE and Building with Nature. The basis the 3-step [ETL|http://en.wikipedia.org/wiki/Extract,_transform,_load] procedure well-known in the database world. ETL describes the process to Extract data from somewhere, Transform it to the strict database datamodel requirements, and Load it into the database. We extended ETL with one crucial extra step: *provide* the data again to users via the web. We believe that any effective data management solution should include users at the start of the ETL process _and_ and the end. Loading data into the database and using data from the database should be possible from the work environment of the user. In the sketch above we explicitly included client and server to highlight the paramount importance of easy and immediate web-based Provide mechanisms of the data, that are not covered by ETL.

ETL contains the followings steps:
* data is not just numbers and meta-information, but consists of raw data produced by the measuring equipment (e.g. volts) + processing scripts.
* raw data + scripts should be stored in the OpenEarthRawData repository enabling version control
* raw data should then be enriched with metadata and processed into useful data products (netCDF, PostgreSQL table) using transformation scripts that should also be put under version control in a repository
* resulting data products should conform to the best open source semantic standards available, e.g. [CF|http://cf-pcmdi.llnl.gov/], [WoRMS|http://www.marinespecies.org/]
* data products should be made available easily via webbased interfaces (OPeNDAP, ODBC ore dedicated DB-APIs, WxS) but also with automatable procedures for widely-used data processing languages such as matlab, IDL, python, fortran, C and java (OpenEarth Tools).
* data products are primarily meant for dissemination, raw data and scripts are primarily meant for archiving.
* meta-data should be gathered and inserted into a central catalogue.

!Data Standards Workflow_small2.PNG|align=centre,width=500pxpx!

Numerous other datasets have been or are being uploaded continually in the MICORE and Building with Nature research programmes. And OpenEarth is not the only initiative to share and disseminate government-paid Earth science data freely on the web using open standards. We made an [inventory of related initiatives|Data sharing initiatives]. Our aim is to spread the use of the open standards and make them stick in our everyday work.

{html}
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/7w2DBazX6g4&hl=en&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/7w2DBazX6g4&hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
{html}