Numerical Model Metadata – a draft standard

 

 

 

 

 

 

Loïs Steenman-Clark

and

Katherine Bouton

 

Centre for Global Atmospheric Modelling (CGAM)

Department of Meteorology

University of Reading

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Document version 1.1 (04.06.2004)

 

Numerical Model Metadata

 

1.      Introduction

 

Metadata, which is data about data, is used for cataloguing data, producing sophisticated and efficient search engines for data archives or repositories and enabling powerful interfaces to be built for data analysis and visualisation. An example of metadata for data is CF compliant netcdf, which supports naming conventions and descriptions of spatial and temporal properties for climate and forecast data.

 

Within CF compliant netcdf the attribute source, which is a character string, describes the method of production of the data. For numerical models, we propose to extend this simple description of the source of the data, with a comprehensive and standardised system of numerical model metadata.

 

By providing metadata for the numerical model as well as for the model output data, software tools for cataloguing and searching the model output data can be extended and refined to include the information about how that data was produced. To provide the information that can identify model output data from particular numerical models with particular settings then we need to provide further metadata layers to describe both the numerical models themselves as well as the experiments using those numerical models that produced the model output data. The goal in the design of this numerical model metadata standard is to provide clear, well-defined and flexible metadata needed for climate and forecast numerical models and experiments, which produce numerical model output data.

 

This document describes the reasoning behind the draft numerical model metadata standard, it does not discuss the software tools that can exploit this metadata nor does it explain about how the metadata is collected.

 

2.      Metadata layers

 

We are considering a numerical model, which is used for numerical modelling experiments that produce model output data.

 

Numerical Model(s) -> numerical modelling

                                      experiment (projects/simulations) –> model output 

                                                                                                 data

 

A numerical model could contain several components: atmosphere, ocean, chemistry etc or the numerical model in a coupled experiment could incorporate any number of model components coupled via a coupler.  Each numerical model component will have its own numerical model metadata

 

A numerical modelling experiment can be a single simulation or a group of simulations that can be grouped together for convenience as a project. The experiment is a generic term that covers all possible ways of running a simulation, from fully coupled Earth System Models to ensemble models.

 

Metadata has to be provided at each stage where the numerical model metadata describes the formulation of the model i.e. the code and the numerical modelling experiment metadata describes how the numerical model has been set up and run to produce model output data. The model output data may have its own metadata for example CF compliant netcdf.

 

Numerical

model components -> have             -> implemented with ->  each method or

                                     properties        different methods        scheme has

                                                              and schemes               different options

                                                                        |                                   |

Numerical modelling

experiments            -> have             ->  choose particular   - > and have particular

                                     properties          methods and                settings for the

                                                               schemes                       different options

 

Each numerical model component will have its own an implementation of particular methods and schemes and its own internal options so there will be some areas where standardisation could be agreed and other areas where model components have to agree their own internal standard. So we can anticipate that there will be both standard and local tables of attributes for the model metadata.

 

The Unified Model (UM), for example, has a user interface which allows users to select and change options in the model code and to set up experiments using the UM. But the UM user interface does not always, for example, use names or terms that are common to the numerical modelling community. Other users of UM output data or data centres cannot access the UM model metadata within the user interface unless they have a user interface themselves and the database entry for the particular experiment which produced that data. The purpose of the model metadata standard is not to replace tools like the UM user interface but to extract essential metadata in standard terms that are common to all numerical model components of this type.

 

The model developers, in general, would provide the metadata for a numerical model, or numerical model component, and the user of the model would provide the experiment metadata. Automatic tools will be developed to produce model and experiment metadata but the purpose of this document is to describe the metadata not the means of producing it.

 

3.      Metadata for the model layer

 

A numerical model needs to be labelled with a component type. In the vocabulary of the PRISM metadata in the PMIOD this attribute is simulated to indicate that the numerical model simulates the atmosphere for example.

                                                 Table 1          

atmosphere

ocean

chemistry

land-surface

This list of simulated models can be expanded.

But we assume that all models simulated 

in Table 1 can be described with the following

five properties.

Numerical properties

Dynamical/physical properties

Input/output properties

Technical properties

Information properties

                       

Table 2

                      

 

           

 

 

The numerical model metadata is all about what the code enables you to do in a numerical modelling experiment, which produces model data. So the numerical model metadata should cover all the methods and schemes implemented in that model.

 

Table 2.1 An exxample for numerical model components simulating the atmosphere

 

Numerical properties

Dynamical/physical

properties

Input/output

properties

Technical

properties

Information

properties

Vertical rep.

Advection

Input requirements

Coding language

Name

Horizontal rep.

Diffusion (horizontal and vertical)

Coupling potential

Maintenance

Provenance

Time integration

Gravity wave drag

Output processing

Versioning

description

Time filering/

smoothing

Chemistry?

 

Parallelisation

references

 

Aerosols

 

 

contact

 

Radiation (LW, SW)

 

 

 

 

Convection

 

 

 

 

Cloud

 

 

 

 

Precipitation

 

 

 

 

Planetary Boundary Layer

 

 

 

 

Land surface processes (vegetation, hydrology)

 

 

 

 

Other

 

 

 

 

These attributes need to be relevant for all atmospheric numerical models that are going to provide metadata and so the vocabulary should be standard for this community. There will be a core set, which can constitute the key processes in the numerical model component. But there will be other processes that are more on periphery or for which a standard vocabulary cannot be agreed or are only present in one numerical model component, which will have to be described in a local table.

 

3.1 Numerical Properties

 

The numerical properties of the model metadata need to capture the numerical methods used by the model component and the actual settings used in the numerical modelling experiment. The numerical properties that need to be included are the horizontal and vertical representation and the time integration method used by the model component.

 

As a starting point the AMIP documentation, produced by PCMDI, tabulated the horizontal representation, which for AMIP I was either spectral or finite difference, the horizontal resolution, the vertical coordinates used and the number of levels with the top and bottom in hpa. CF compliant netcdf has moved further to produce standard metadata for vertical coordinates, which also has been adopted by the PRISM community for the PMIOD. The development of standard metadata for horizontal representation and the time integration schemes need more work and discussion within the community. Only a simple extension is proposed here to suggest some qualifying attributes that may be needed.

 

Table 3.1 numerical properties

 

Property

Type

Attributes

Options

horizontal

representation

Finite difference

discretization

description,

reference

 

 

spectral

Truncation,

Description

reference

 

 

other

Description,

reference

 

Vertical

representation

Dimensional vert. coord

Units

 positive

No. of levels

 

Dimensionless vert coord

Standard term

Formula terms

No. of levels

Values for the formula terms

Time integration

Scheme

local name

Time steps per day

 

 

Description

 

 

 

reference

 

Time filtering/

smoothing

 

 

 

 

The numerical model metadata describes the schemes used and the options allowed whereas the settings are provided at the time of the numerical experiment. Supplementary information can be derived from the numerical properties metadata, for example the top and bottom pressure levels of an atmospheric model component. These derived properties are a function of the tools that exploit the numerical model metadata not the metadata schema itself. What we need to be certain of is that the metadata contains sufficient information to provide tools with the means to produce supplementary information or pictures of the model level distribution.

 

3.2 Dynamical/physical Properties

 

The dynamical/physical properties have either a standard name or are described as ‘other’. The community should have agreed the standard names, for example those shown in Table 2.1, whereas the attribute ‘other’ allows for new or particular or local dynamical/physical properties of a simulated model to be included in the metadata schema.

 

Table 3.2 dynamical/physical  properties

Local name

 

Documenation

author, title, reference, URL, type

Mode

off, on,  modified, new

      Local options

          |

                   Local option settings

 

                                   Mode.file  (describes where to find the change)

                                   Mode.reason (describes why the change was made)

 

In the Unified Model versions 4.5 atmosphere numerical model code there are several schemes that can be used for the dynamical/physical property with the standard name of gravity wave drag. Each scheme has the attributes described in table 3.2. For example

 

Table 3.2.2 An example of implementation of dynamical/physical properties for a numerical model simulating the atmosphere.

 

Local name

Richardson

Linear Stress profile

Anistrophic orography

Local options

Starting level

Surface gravity wave const.

Trapped lee wave const.

Froude option

Starting level

Surface gravity wave    const.

 

Froude option

Starting level

Surface gravity wave const.

Trapped lee wave const.

Froude option

Documentation

X

Y

Z

Mode

on/off/modified/new

on/off/modified/new

on/off/modified/new

 

In a fully plug compatible numerical model code, where each scheme is self contained so it could be removed or changed or swapped, it would be good to have a self contained set of metadata for each scheme. So each scheme would have metadata that defined its input requirements, what outputs it could provide and a description of the assumptions the scheme made, similar to the PMIOD, the full metadata schema for the model coupling in PRISM. This is a probably step too far for this initial model metadata standard but it should be a consideration for an extension to this metadata schema.

 

3.3 Input/Output Properties

 

It is challenging to provide comprehensive and standardised input and output metadata to accommodate all component models.  All component models should have the following input/output properties.

 

Table 3.3 Definition of input/output properties

 

Input requirements

The external information needed by the model component either at the start of an experiment or during the course of an experiment.

Coupling Potential

The information required for the model component to be coupled to another model component.

Output processing

The spatial and temporal processing carried out to produce the model output data from the numerical modelling experiment.

 

The external information required by a model component depends on several factors

-         whether the simulated model is run in standalone or coupled mode and what other model components are included

-         what dynamical/physical properties are set for the experiment

-         the type of experiment that is being performed with the model component(s).

 

If we look at all the possible external input files that could be used to perform an experiment with UM version 4.5 atmosphere component.

1.      Initial/start/restart file

2.      Ozone plus radiation files

3.      Orography and land sea mask

4.      Passive tracers

5.      Local boundary conditions

6.      Surface/soil/vegetation files

7.      Sea surface temperature plus sea ice

8.      Chemistry plus aerosol files

Then 6, 7 and 8 will be described by the coupling information if this model component was part of a full Earth System Modelling experiment with separate ocean, sea-ice, land surface and chemistry model components. Files for 5 are only needed if the component was run in a limited area mode rather than a global mode. The requirements for files in sections 1 to 4 would depend on dynamical and physical properties of the experiment being performed using the model component(s). Can input file categories be defined for each model component type? Are these 8 input file groupings sufficient to accommodate all atmospheric numerical model components?

 

Table 3..3.1 input/output properties

 

Input requirements

Standard name

 

 

 

Local name

 

 

 

Description

 

 

 

Mode

on, off, modified, new

 

 

Grid Requirements

horizontal

 1D or 2D

 

 

vertical

Single

Multi-level

 

 

time

Time varying

Time invariant

Time dependent

 

Technical requirements

File format

 

 

 

Calendar

 

Coupling potential

PMIOD

 

 

Output processing

Standard  name  / other

 

 

 

Horizontal processing

 

 

 

Vertical Processing

 

 

 

Time processing

 

 

 

The PRISM project has produced a draft metadata schema for the coupling requirements of numerical model components. “The Potential Model Input and Output Description (PMIOD) describes the relations a component model is able to establish within the coupled model through inputs and output”. Each model component in a coupled experiment has a PMIOD. PRISM expects that the model component developer will write the PMIOD. It should be possible to construct the PMIOD from the model component metadata. The PMIOD has 6 types of information

·        General characteristics of the component (Model metadata information properties)

·        Information on the grids (Model metadata numerical properties)

·        Transient variables (Model metadata input/output properties)

·        Persistent variables (Model metadata dynamical/physical properties)

·        Internal dependency (Not yet fully defined?)

The transient variables are all the variables the model component expects or can provide in a coupled experiment. For example sea surface temperatures needed for an atmosphere model component, provided by an ocean model component.  Persistent variables are variables that do not change during the course of a coupled model experiment but they need to be shared between the model components. An example of a persistent variable is the Earth’s radius. The PMIOD should therefore be constructed from general model metadata although there may be some metadata, which is only relevant to the model component coupling, that may need to be encapsulated in the input/output properties coupling potential section. Once the PMIOD has been finalised then the numerical model metadata for this section can be defined.

 

Numerical models can have internal or external data processing capabilities that can perform the spatial or temporal processing that is required before the model output data is made available to the community. The numerical model metadata needs to capture the information about this processing of model output data. This is more easily done if this processing is an integral part of the numerical model. More work needs to be undertaken to assess the requirements in this area of the numerical model metadata so for this draft standard output processing will be descriptive with the following attributes

 

CF Standard name

 

Other

 

Horizontal processing

Description of the methods used for processing carried out

internally or externally

Vertical processing

Description of the methods used.

Time processing

Description of the methods used.

 

 

3.4 Technical Properties

 

In many ways the technical properties of the numerical model components is of least interest to the broad community accessing the numerical model output data! However the metadata captured in the technical properties of the numerical model component may be key to

-         promoting the trust and confidence of the model output data

-         modellers or modelling organisations cataloguing their own model output data archives

The aim of the technical properties is not necessarily to provide sufficient information to be able to reproduce the model output data.

 

Table 2.1.4 technical properties

 

Atribute

Options

Coding language used

Compilers

Optimisation

Version

Installation

Computer system

Maintenance systems

 

Parallelistion

Performance

 

 

3.5 Information Properties

 

At each metadata layer there is a need for information properties. For th numerical model the information properties need to define the code owner or installer or guardian. The experiment information property needs to capture the information about who performed the experiment and the data information properties needs to describe who is storing the data or who can access it.

 

Table 2.1.5 information  properties

 

 

Model

Experiment

Data

Name

Who owns it

Who ran it

Who is the caretaker

Contact point

 

 

 

History (provenance)

Where did it come from

Where are related experiments

Where did the data come from

Description

What model

What experiment

What data

Documentation

Of the numerical model component

Related to the experiment

Publications using this data

 

 

There are also a plethora of these information points throughout the numerical model metadata proposed standard. For example information is required for schemes in the dynamical/physical properties, for the initial input files or for the modified files. There needs to be a concerted effort to coordinate and carefully define the needs and the requirements of these information points.

 

4.      Experiment metadata

 

The experiment (project or simulation) metadata is the actual setting of the options in the numerical model metadata used for the experiment that produced the model output data. For example

 

Model (name=UM version 4.5) simulated=atmosphere

Vertical properties -> dimensionless vertical coordinate,

                                    standard term = hybrid sigma pressure coordinate

                                   options=no. of levels, setting = 50

                                  options= formula terms = (a, b, ps), setting =(…..)

 

5.      Summary

 

The aim of the numerical model metadata is

-         to provide well defined information about the numerical models used for experiments that produce data. This means that the modelling community has to agree to a numerical model metadata standard and the standardised vocabulary used to describe it.

-         to enhance the existing model output data metadata to promote the sharing of that data beyond the community that produced it.

-         to provide numerical model metadata that will enhance software applications built to discover, compare and  understand model output data.

 

Current work

-         exploring past and current work on community metadata, for example PCMDI, PRISM

-         working with projects, for example NDG, FLUME, PRISM, that will require numerical model metadata

-         propose a draft numerical model metadata schema

-         implement the draft schema to expose its strengths and weaknesses

 

Future work

-         discuss the draft metadata schema with a broad and interested community

-         work towards a standardisation of the vocabulary

-         flesh out the details in the draft numerical model metadata schema

-         develop tools and utilities which can exploit the metadata