Extracting information from timeseries of netCDF files

This page describes the steps to create time series from a set of netCDF files with temporal information. The netCDF's used in this case originate from KNMI and were delivered within the NHI-Zoetwatervekenning project. Examples of these netCDF's can be found on the NMDC site.

This is done with python. The first few lines describe the necessary modules.

import os
import sys

import numpy as np
import netCDF4

In the next code block the value of a certain point is retrieved from the netCDF. In this case netCDF's were used with 3D shape of n times, ny and nx.

def getvalnc(anc,x,y):
    xar,yar,tmp = readnc(anc)
    indx = getindx(xar,x)
    indy = getindx(yar,y)
    val = tmp[0,indy,indx]
    return val

In the next code block the necessary array's with information are retrieved from the netCDF.

def readnc(url):
    dataset = netCDF4.Dataset(url)
    y = dataset.variables["y"][:]
    x = dataset.variables["x"][:]
    tmp = dataset.variables["prediction"][:]
    dataset.close()
    return x,y,tmp

Next code block searches for the proper index within the x and y arrays of the netCDF.

def getindx(arr,value):
    absx = np.abs(arr-value)
    sd = absx.min()
    return np.nonzero(absx == sd)[0][0]

Main code block. Here the path to the netCDF's are set. This can be done in a more proper way than the example given here. A nice suggestion is to use the catalog.html of the OPeNDAP server. But in this case complete directories of netCDF's were available.

bpath = r'<path to netCDF basedirectory>'
pf = r'<point file>'
outfile = r'outputfile.csv'

lsterrors = []
xvals,yvals,idvals = <procedure to read a textfile>(pf)

of = open(outfile,'wb')
of.write('date'+[str(i) for i in idvals]+'\r\n')

for root, dirs, files in os.walk(bpath):
    try:
        anc = os.path.join(root,files[0])
        if os.path.isfile(anc):
            fl = anc.split('\\')
            print 'reading nc',fl[(len(fl)-1)]
            lst = root.split('\\')
            ayr = lst[(len(lst)-3)]
            am = lst[(len(lst)-2)]
            ad = lst[(len(lst)-1)]
            ascen = lst[(len(lst)-4)]
            adate = ''.join([ayr,am,ad])
            dctvals = dict.fromkeys(idvals)
            for i in range(len(xvals)):
                anx = xvals[i]
                any = yvals[i]
                anid = int(idvals[i])
                dctvals[anid] = getvalnc(anc,anx,any)

            l = []
            for k,v in dctvals.iteritems():
                l.append('%s' % (str(v)))
            astr = ','.join(l)
            of.write(adate+','+astr+'\r\n')
    except:
        lsterrors.append(anc)
        print 'error with nc ',anc

of.close()

Some remarks:
- In the case described above the directory structure is used to derive the proper date. This can also be done by extracting the proper data from the netCDF information.
- The assumption was made that there was only 1 dataset in the netCDF and that the name was prediction. Using gdalinfo to retrieve the proper dataset name ensures that this script will work.
- The assumption was made that there are x and y variables in the netCDF. The given functions will also be working for lat lon array's.

Space shortcuts

Child pages