Loïs Steenman-Clark
and
Katherine Bouton
Centre for Global Atmospheric Modelling (CGAM)
Department of Meteorology
University of Reading
Document version 1.1 (04.06.2004)
Numerical Model Metadata
1. Introduction
Metadata, which is data about data, is used for cataloguing data, producing sophisticated and efficient search engines for data archives or repositories and enabling powerful interfaces to be built for data analysis and visualisation. An example of metadata for data is CF compliant netcdf, which supports naming conventions and descriptions of spatial and temporal properties for climate and forecast data.
Within CF compliant netcdf the attribute source, which is a character string, describes the method of production of the data. For numerical models, we propose to extend this simple description of the source of the data, with a comprehensive and standardised system of numerical model metadata.
By providing metadata for the numerical model as well as for the model output data, software tools for cataloguing and searching the model output data can be extended and refined to include the information about how that data was produced. To provide the information that can identify model output data from particular numerical models with particular settings then we need to provide further metadata layers to describe both the numerical models themselves as well as the experiments using those numerical models that produced the model output data. The goal in the design of this numerical model metadata standard is to provide clear, well-defined and flexible metadata needed for climate and forecast numerical models and experiments, which produce numerical model output data.
This document describes the reasoning behind the draft numerical model metadata standard, it does not discuss the software tools that can exploit this metadata nor does it explain about how the metadata is collected.
2. Metadata layers
We are considering a numerical model, which is used for numerical modelling experiments that produce model output data.
Numerical Model(s) -> numerical modelling
experiment (projects/simulations) –> model output
data
A numerical model could contain several components: atmosphere, ocean, chemistry etc or the numerical model in a coupled experiment could incorporate any number of model components coupled via a coupler. Each numerical model component will have its own numerical model metadata
A numerical modelling experiment can be a single simulation or a group of simulations that can be grouped together for convenience as a project. The experiment is a generic term that covers all possible ways of running a simulation, from fully coupled Earth System Models to ensemble models.
Metadata has to be provided at each stage where the numerical model metadata describes the formulation of the model i.e. the code and the numerical modelling experiment metadata describes how the numerical model has been set up and run to produce model output data. The model output data may have its own metadata for example CF compliant netcdf.
Numerical
model components -> have -> implemented with -> each method or
properties different methods scheme has
and schemes different options
| |
Numerical modelling
experiments -> have -> choose particular - > and have particular
properties methods and settings for the
schemes different options
Each numerical model component will have its own an implementation of particular methods and schemes and its own internal options so there will be some areas where standardisation could be agreed and other areas where model components have to agree their own internal standard. So we can anticipate that there will be both standard and local tables of attributes for the model metadata.
The Unified Model (UM), for example, has a user interface which allows users to select and change options in the model code and to set up experiments using the UM. But the UM user interface does not always, for example, use names or terms that are common to the numerical modelling community. Other users of UM output data or data centres cannot access the UM model metadata within the user interface unless they have a user interface themselves and the database entry for the particular experiment which produced that data. The purpose of the model metadata standard is not to replace tools like the UM user interface but to extract essential metadata in standard terms that are common to all numerical model components of this type.
The model developers, in general, would provide the metadata for a numerical model, or numerical model component, and the user of the model would provide the experiment metadata. Automatic tools will be developed to produce model and experiment metadata but the purpose of this document is to describe the metadata not the means of producing it.
3. Metadata for the model layer
A numerical model needs to be labelled with a component type. In the vocabulary of the PRISM metadata in the PMIOD this attribute is simulated to indicate that the numerical model simulates the atmosphere for example.
Table 1
|
atmosphere |
|
ocean |
|
chemistry |
|
land-surface |
This list of simulated models can be expanded.
But we assume that all models simulated
in Table 1 can be described with the following
five properties.
|
Numerical properties |
|
Dynamical/physical
properties |
|
Input/output properties |
|
Technical properties |
|
Information properties |
Table 2
The numerical model metadata is all about what the code enables you to do in a numerical modelling experiment, which produces model data. So the numerical model metadata should cover all the methods and schemes implemented in that model.
Table 2.1 An exxample for numerical model components simulating the atmosphere
|
Numerical properties |
Dynamical/physical properties |
Input/output properties |
Technical properties |
Information properties |
|
Vertical rep. |
Advection |
Input requirements |
Coding language |
Name |
|
Horizontal rep. |
Diffusion (horizontal and
vertical) |
Coupling potential |
Maintenance |
Provenance |
|
Time integration |
Gravity wave drag |
Output processing |
Versioning |
description |
|
Time filering/ smoothing |
Chemistry? |
|
Parallelisation |
references |
|
|
Aerosols |
|
|
contact |
|
|
Radiation (LW, SW) |
|
|
|
|
|
Convection |
|
|
|
|
|
Cloud |
|
|
|
|
|
Precipitation |
|
|
|
|
|
Planetary Boundary Layer |
|
|
|
|
|
Land surface processes
(vegetation, hydrology) |
|
|
|
|
|
Other |
|
|
|
These attributes need to be relevant for all atmospheric numerical models that are going to provide metadata and so the vocabulary should be standard for this community. There will be a core set, which can constitute the key processes in the numerical model component. But there will be other processes that are more on periphery or for which a standard vocabulary cannot be agreed or are only present in one numerical model component, which will have to be described in a local table.
3.1 Numerical Properties
The numerical properties of the model metadata need to capture the numerical methods used by the model component and the actual settings used in the numerical modelling experiment. The numerical properties that need to be included are the horizontal and vertical representation and the time integration method used by the model component.
As a starting point the AMIP documentation, produced by PCMDI, tabulated the horizontal representation, which for AMIP I was either spectral or finite difference, the horizontal resolution, the vertical coordinates used and the number of levels with the top and bottom in hpa. CF compliant netcdf has moved further to produce standard metadata for vertical coordinates, which also has been adopted by the PRISM community for the PMIOD. The development of standard metadata for horizontal representation and the time integration schemes need more work and discussion within the community. Only a simple extension is proposed here to suggest some qualifying attributes that may be needed.
Table 3.1 numerical properties
|
Property |
Type |
Attributes |
Options |
|
horizontal representation |
Finite difference |
discretization description, reference |
|
|
|
spectral |
Truncation, Description reference |
|
|
|
other |
Description, reference |
|
|
Vertical representation |
Dimensional vert. coord |
Units positive |
No. of levels |
|
|
Dimensionless vert coord |
Standard term Formula terms |
No. of levels Values for the formula terms |
|
Time integration |
Scheme |
local name |
Time steps per day |
|
|
|
Description |
|
|
|
|
reference |
|
|
Time filtering/ smoothing |
|
|
|
The numerical model metadata describes the schemes used and the options allowed whereas the settings are provided at the time of the numerical experiment. Supplementary information can be derived from the numerical properties metadata, for example the top and bottom pressure levels of an atmospheric model component. These derived properties are a function of the tools that exploit the numerical model metadata not the metadata schema itself. What we need to be certain of is that the metadata contains sufficient information to provide tools with the means to produce supplementary information or pictures of the model level distribution.
3.2 Dynamical/physical Properties
The dynamical/physical properties have either a standard name or are described as ‘other’. The community should have agreed the standard names, for example those shown in Table 2.1, whereas the attribute ‘other’ allows for new or particular or local dynamical/physical properties of a simulated model to be included in the metadata schema.
Table 3.2 dynamical/physical properties
|
Local
name |
|
|
Documenation |
author,
title, reference, URL, type |
|
Mode |
off,
on, modified, new |
![]()
Local options
|
Local option settings
Mode.file (describes where to find the change)
Mode.reason (describes why the change was made)
In the Unified Model versions 4.5 atmosphere numerical model code there are several schemes that can be used for the dynamical/physical property with the standard name of gravity wave drag. Each scheme has the attributes described in table 3.2. For example
Table 3.2.2 An example of implementation of dynamical/physical properties for a numerical model simulating the atmosphere.
|
Local name |
Richardson |
Linear Stress profile |
Anistrophic orography |
|
Local options |
Starting level Surface gravity wave const. Trapped lee wave const. Froude option |
Starting level Surface gravity wave const. Froude option |
Starting level Surface gravity wave const. Trapped lee wave const. Froude option |
|
Documentation |
X |
Y |
Z |
|
Mode |
on/off/modified/new |
on/off/modified/new |
on/off/modified/new |
In a fully plug compatible numerical model code, where each scheme is self contained so it could be removed or changed or swapped, it would be good to have a self contained set of metadata for each scheme. So each scheme would have metadata that defined its input requirements, what outputs it could provide and a description of the assumptions the scheme made, similar to the PMIOD, the full metadata schema for the model coupling in PRISM. This is a probably step too far for this initial model metadata standard but it should be a consideration for an extension to this metadata schema.
3.3 Input/Output Properties
It is challenging to provide comprehensive and standardised input and output metadata to accommodate all component models. All component models should have the following input/output properties.
Table 3.3 Definition of input/output
properties
|
Input requirements |
The external information needed by the model component either at the
start of an experiment or during the course of an experiment. |
|
Coupling Potential |
The information required for the model component to be coupled to another
model component. |
|
Output processing |
The spatial and temporal processing carried out to produce the model
output data from the numerical modelling experiment. |
The external information required by a model component depends on several factors
- whether the simulated model is run in standalone or coupled mode and what other model components are included
- what dynamical/physical properties are set for the experiment
- the type of experiment that is being performed with the model component(s).
If we look at all the possible external input files that could be used to perform an experiment with UM version 4.5 atmosphere component.
1. Initial/start/restart file
2. Ozone plus radiation files
3. Orography and land sea mask
4. Passive tracers
5. Local boundary conditions
6. Surface/soil/vegetation files
7. Sea surface temperature plus sea ice
8. Chemistry plus aerosol files
Then 6, 7 and 8 will be described by the coupling information if this model component was part of a full Earth System Modelling experiment with separate ocean, sea-ice, land surface and chemistry model components. Files for 5 are only needed if the component was run in a limited area mode rather than a global mode. The requirements for files in sections 1 to 4 would depend on dynamical and physical properties of the experiment being performed using the model component(s). Can input file categories be defined for each model component type? Are these 8 input file groupings sufficient to accommodate all atmospheric numerical model components?
Table 3..3.1 input/output properties
|
Input requirements |
Standard name |
|
|
|
|
Local name |
|
|
|
|
Description |
|
|
|
|
Mode |
on, off, modified, new |
|
|
|
Grid Requirements |
horizontal |
1D or 2D |
|
|
|
vertical |
Single Multi-level |
|
|
|
time |
Time varying Time invariant Time dependent |
|
|
Technical requirements |
File format |
|
|
|
|
Calendar |
|
|
Coupling potential |
PMIOD |
|
|
|
Output processing |
Standard name / other |
|
|
|
|
Horizontal processing |
|
|
|
|
Vertical Processing |
|
|
|
|
Time processing |
|
|
The PRISM project has produced a draft metadata schema for the coupling requirements of numerical model components. “The Potential Model Input and Output Description (PMIOD) describes the relations a component model is able to establish within the coupled model through inputs and output”. Each model component in a coupled experiment has a PMIOD. PRISM expects that the model component developer will write the PMIOD. It should be possible to construct the PMIOD from the model component metadata. The PMIOD has 6 types of information
· General characteristics of the component (Model metadata information properties)
· Information on the grids (Model metadata numerical properties)
· Transient variables (Model metadata input/output properties)
· Persistent variables (Model metadata dynamical/physical properties)
· Internal dependency (Not yet fully defined?)
The transient variables are all the variables the model component expects or can provide in a coupled experiment. For example sea surface temperatures needed for an atmosphere model component, provided by an ocean model component. Persistent variables are variables that do not change during the course of a coupled model experiment but they need to be shared between the model components. An example of a persistent variable is the Earth’s radius. The PMIOD should therefore be constructed from general model metadata although there may be some metadata, which is only relevant to the model component coupling, that may need to be encapsulated in the input/output properties coupling potential section. Once the PMIOD has been finalised then the numerical model metadata for this section can be defined.
Numerical models can have internal or external data processing capabilities that can perform the spatial or temporal processing that is required before the model output data is made available to the community. The numerical model metadata needs to capture the information about this processing of model output data. This is more easily done if this processing is an integral part of the numerical model. More work needs to be undertaken to assess the requirements in this area of the numerical model metadata so for this draft standard output processing will be descriptive with the following attributes
|
CF Standard name |
|
|
Other |
|
|
Horizontal processing |
Description of the methods used for processing carried out internally or externally |
|
Vertical processing |
Description of the methods used. |
|
Time processing |
Description of the methods used. |
3.4 Technical Properties
In many ways the technical properties of the numerical model components is of least interest to the broad community accessing the numerical model output data! However the metadata captured in the technical properties of the numerical model component may be key to
- promoting the trust and confidence of the model output data
- modellers or modelling organisations cataloguing their own model output data archives
The aim of the technical properties is not necessarily to provide sufficient information to be able to reproduce the model output data.
Table 2.1.4 technical properties
|
Atribute |
Options |
|
Coding language used |
Compilers Optimisation |
|
Version |
Installation Computer system |
|
Maintenance systems |
|
|
Parallelistion |
Performance |
3.5 Information Properties
At each metadata layer there is a need for information properties. For th numerical model the information properties need to define the code owner or installer or guardian. The experiment information property needs to capture the information about who performed the experiment and the data information properties needs to describe who is storing the data or who can access it.
Table 2.1.5 information properties
|
|
Model |
Experiment |
Data |
|
Name |
Who owns it |
Who ran it |
Who is the caretaker |
|
Contact point |
|
|
|
|
History (provenance) |
Where did it come from |
Where are related experiments |
Where did the data come from |
|
Description |
What model |
What experiment |
What data |
|
Documentation |
Of the numerical model component |
Related to the experiment |
Publications using this data |
There are also a plethora of these information points throughout the numerical model metadata proposed standard. For example information is required for schemes in the dynamical/physical properties, for the initial input files or for the modified files. There needs to be a concerted effort to coordinate and carefully define the needs and the requirements of these information points.
4. Experiment metadata
The experiment (project or simulation) metadata is the actual setting of the options in the numerical model metadata used for the experiment that produced the model output data. For example
Model (name=UM version 4.5) simulated=atmosphere
Vertical properties ->
dimensionless vertical coordinate,
standard term = hybrid sigma
pressure coordinate
options=no. of levels, setting =
50
options= formula terms = (a, b, ps), setting =(…..)
5. Summary
- to provide well defined information about the numerical models used for experiments that produce data. This means that the modelling community has to agree to a numerical model metadata standard and the standardised vocabulary used to describe it.
- to enhance the existing model output data metadata to promote the sharing of that data beyond the community that produced it.
- to provide numerical model metadata that will enhance software applications built to discover, compare and understand model output data.
Current work
- exploring past and current work on community metadata, for example PCMDI, PRISM
- working with projects, for example NDG, FLUME, PRISM, that will require numerical model metadata
- propose a draft numerical model metadata schema
- implement the draft schema to expose its strengths and weaknesses
Future work
- discuss the draft metadata schema with a broad and interested community
- work towards a standardisation of the vocabulary
- flesh out the details in the draft numerical model metadata schema
- develop tools and utilities which can exploit the metadata