Marine Data Store

The datasets available for training ML methods.

CNRS
MEOM

The Marine Data Store (MDS) has a lot of data that is available that can be used for training an end-to-end ML system. They have raw observations that come from satellite altimeters and in-situ sources. There is also a reanalysis product that combines the observations and NEMO model. Furthermore they have make forecasts which can be a 'free-run' product. All of the data listed here are available from the Copernicus project via the MDS.

In this document, we focus on each of the individual components: 1) observations, 2) reanalysis and 3) free-run. In the remainder of the document, we discuss how we can get access to data for each of the components.


Observation Data

There is a number of available observation data available from the platform. They primarily come in two forms: 1) observations from satellite altimetry and 2) observation from in-situ measurements.

NameTypeVariablesDatesLink
Satellite AltimetrySSHMarine Data Store
In-SituSSH, SSTMarine Data Store
SSTEUMETSAT
SSTNASA

Satellite Altimetry. The satellite altimetry data is an aggregate from all available satellite altimeters [European Union-Copernicus Marine Service, 2021]. This is a L3 product that originally comes from the CLS group that use it to create interpolated maps using covariance-based schemes. It has also been further post-processed to be compatible with the data assimilation schemes used by Mercato. This has an update frequency of 2hours every day.

In-Situ Observations. The Global Ocean In-Situ database is an aggregate of all available near-real-time observations [European Union-Copernicus Marine Service, 2015]. This is a L2 product that originally comes from the IFremer group. It has also been further processed to be compatible with the data assimilation schemes used by Mercato. This has an update frequency of daily.


Reanalysis Data

Global Dataset. The GLORYS12V1 [European Union-Copernicus Marine Service, 2018] product is the CMEMS global ocean eddy-resolving reanalsyis of the alimetry data outlined above. It has a horizontal resolution of 1/121/12^\circ and a vertical resolution of 50 levels. It has assimilated alongtrack alimetry data, satellite sea surface temperature, sea ice concentration and in-situ temperature and salinity vertical profiles.

NameHorizontal ResolutionVertical LevelsDatesLink
GLORYS12V11/121/12^\circ501993 - 2020
Ensemble1/41/4^\circ751993-2020MDS

The MDS also features an ensemble product [European Union-Copernicus Marine Service, 2019] which combines the reanalysis of GLORYSV4 (FR), ORAS5 from ECMWF (GER), GloSea5 from Met Office (UK) and C-GLORSv7 from CMCC (IT)

Why Reanalysis? Reanalysis produces a comprehensive combination of model and observations. It uses the outputs of a numerical GCM that simulates the evolution of the ocean state combined with observations to generate a synthesized estimate of the ocean state. For a forecasting problem, one approach is to train a model based on reanalysis data because it is the . Another approach is to train a model based on model data. However, we can postulate that the rea

GLORYS Product.
The GLORYS product has the temperature, salinity, current speed, current direction, sea-level, sea-ice extent, sea-ice concentration, and sea-ice thickness.


Free-Run


Regions

Attacking the global interpolation problem directly would be very difficult without the experience or resources. It is better to solve a series of drastically, simpler problems until the full architecture is built. In addition, one could always use transfer learning to retrain the previously learned methodologies on the new region.

The datasets mentioned above cover the globe at a defined resolution. However, the CEMS service also focuses on particular regions.

RegionsExtent
Global
Baltic Sea
Atlantic-Iberian Biscay Irish Ocean
Mediterranean Sea
Atlantic-European North West Shelf

Resolutions

The same reasoning applies to the resolutions we should apply the methods. Attacking the very high resolutions would difficult to handle logistically. The higher resolutions are high-dimensionally and heavily correlated. The signal complexity is also much higher yet the amount of iid data is not higher. So we propose to try and solve the problem at different resolutions ranging in increasing complexity.

The datasets mentioned above cover the globe at both the 1/4 and 1/12 degree resolutions. However, they have some other resolutions for different regions, e.g. Mediterranean. Choosing the lowest resolution for the smallest region of interest outlined above would be the easiest problem to tackle logistically. Again, we can also increase the difficultly in terms of scale and signal complexity by either increasing the resolution of the training data/model or changing the region of interest.


Frequency

We repeat the logic from above but for frequency: attacking the problem at a high frequency would be difficult logistically. So we can try to apply an assimilation scheme at a lower frequency.

References
  1. European Union-Copernicus Marine Service. (2021). GLOBAL OCEAN ALONG-TRACK L3 SEA SURFACE HEIGHTS REPROCESSED (1993-ONGOING) TAILORED FOR DATA ASSIMILATION. Mercator Ocean International. 10.48670/MOI-00146
  2. European Union-Copernicus Marine Service. (2015). Global Ocean- In-Situ Near-Real-Time Observations. Mercator Ocean International. 10.48670/MOI-00036
  3. European Union-Copernicus Marine Service. (2018). Global Ocean Physics Reanalysis. Mercator Ocean International. 10.48670/MOI-00021
  4. European Union-Copernicus Marine Service. (2019). Global Ocean Ensemble Physics Reanalysis. Mercator Ocean International. 10.48670/MOI-00024