Wednesday, January 1, 2014

A beginners guide to scripting with UV-CDAT

The second step of my guide to suggests that weather/climate scientists install and learn . In a nutshell, UV-CDAT is an open-source application that links together numerous software packages to form an integrated environment for weather/climate data analysis and visualisation. In other words, it's a complete data analysis environment like Matlab, IDL or Mathmatica, only it's free, has broader capabilities, and is written explicitly with the weather/climate community in mind.



The American consortium in charge of developing UV-CDAT consists of four Department of Energy laboratories (Los Alamos, Lawrence Berkeley, Lawrence Livermore National Laboritory, and Oak Ridge), two universities (University of Utah and the Polytechnic Institute of New York University), NASA, and two private companies (Kitware and tech-X). The development team has done a wonderful job in a relatively short amount of time, bringing together a suite of climate data analysis tools (CDAT), a provenance tracking and workflow recording system (VisTrails), and a collection of some of the the most popular visualisation tools out there (ParaView, DV3D, VisIt, ViSUS and VTK) (see for details of how this was achieved). The end product is:




* An awesome GRAPHICAL USER INTERFACE where you can quickly and easily view your data from every angle and projection imaginable

* A SCRIPTING ENVIRONMENT that puts all the tools you'll ever need right at your fingertips

* A WORKFLOW BUILDER to help manage the scheduling, tracking and recording of large data processing pipelines



The first of these is pretty self explanatory (e.g. check out these and video clip examples, in addition to the UV-CDAT itself), while the third is really only relevant if you're in charge of coordinating very large data processing tasks (e.g. the post processing of CMIP5 climate model simulations). As such, my focus here is on the scripting environment. In particular, once you've installed UV-CDAT on your machine (see instructions at the end of the post), if you go ahead and fire up IPython (an excellent interactive environment from which to explore what's going on) you'll see something like the following:



$ /usr/local/uvcdat/1.4.0/bin/ipython

Python 2.7.4 (default, Oct 21 2013, 17:01:37)

Type "copyright", "credits" or "license" for more information.



IPython 0.13.2 -- An enhanced Interactive Python.

? -> Introduction and overview of IPython's features.

%quickref -> Quick reference.

help -> Python's own help system.

object? -> Details about 'object', use 'object??' for extra details.



In [1]:



If you then type import at the prompt at hit the [tab] key, the interpreter will tell you that there's a whopping 490 or so modules available to import into the environment!



In [1]: import

Display all 492 possibilities? (y or n)



The majority of these modules are only of interest to developers working on improving UV-CDAT, but among them is a collection of pretty much all the modules a typical weather/climate scientist could need in writing their own data analysis scripts. I'm not aware of any consolidated list of this collection of useful modules, let alone links to the relevant documentation (which can sometimes be difficult to track down), so I've tried to provide one below. My list isn't exhaustive (i.e. it's mainly the modules I find useful), so please let me know if there are other modules you think should be added!



CORE CDAT MODULES



When scripting in the Python/UV-CDAT environment, you will use the following core CDAT modules most frequently:



cdms2

The Climate Data Management System (cdms) module is primarily used for netCDF input and output.



cdutil

The Climate Data Specific Utilities (cdutil) module contains a whole range of utilities for calculating things like climatologies and anomalies, for custom spatial regions and temporal seasons.



genutil

The General Utilities (genutil) module has utilities for all sorts of climate relevant statistics (e.g. correlation, covariance, auto-correlation, lagged correlation, root mean square, standard deviation, percentiles, linear regression, etc).



MV2

This module is basically a copy of the used for dealing with masked data arrays, but it preserves the metadata contained in cdms2 variables.



On the UV-CDAT and you'll find links to papers and videos explaining the visualisation interface, but nothing on the actual data analysis/scripting capability. As mentioned previously, this is because they didn't develop that capability themselves - they've simply linked together lots of useful packages that have their own documentation located elsewhere. The functionality of CDAT hasn't changed substantially since the mid-2000s, so the original has everything you need in terms of documentation for the core CDAT modules, including , , , and the (API).



REGRIDDING MODULES



Since regridding is performed extensively by the visualisation interface, there are a number of modules included:



regrid2

The basic CDAT regridding package.



css

A collection of modules from the ngmath library, which contains a range of interpolators and approximators for one-, two-, and three-dimensional data. Details can be found at the relevant or .



A Python interface to the Earth System Modeling Framework (ESMF) regridding utility. ESMF is software for building and coupling weather, climate, and related models.



GENERAL PYTHON MODULES



Naturally, the UV-CDAT installation also comes with pretty much all the general Python modules that people in the weather/climate sciences find useful. These include:



The most widely used Python plotting packages. As mentioned in my post, I'd recommend using (which is not included with UV-CDAT) if you're going to be doing a lot of plotting with Python.



Numerical programming and scientific data analysis/statistics.



For .



For handling command line arguments.



GENERAL CLIMATE PACKAGES



There are also numerous other user contributed packages included with the UV-CDAT install:



Empirical Orthogonal Function (EOF) analysis.



Calculates wind related quantities such as the streamfunction, velocity potential and divergence/convergence.



Routines for statistical analysis using L-moments (i.e. measures of the location, scale and shape of probability distributions).



INSTALLATION AND HELP



Installing UV-CDAT can be a little tricky, as it requires that a number of common software development packages be installed on your machine first. The specific requirements differ depending on your operating system, however there are thorough available. Failing that, the is well monitored and responses are usually prompt, and I'm hopeful that the recently launched will take off in the months ahead.



It's often easier to make sense of a new package by actually seeing it in action (i.e. by copying someone else's code), so also feel free to check out the source code at , as I use UV-CDAT for almost all my scripting.
Full Post

No comments:

Post a Comment