Skip to content

Visually Comparing Climate Models


Summary

In this notebook, I will be comparing three climate reanalysis models:

  • NCEP-DOE Reanalysis 2: Surface
  • ERA5
  • CMIP5

I will be looking at the following variables:

  • Mean Sea Level Pressure (CMIP5, ERA5, NCEP)
  • Surface Pressure (ERA5, NCEP)

I will be trying to user RBIG in order to assess how similar these models are. I'll be looking at the following IT measures. If these climate models are that similar, then they should exhibit similar IT measures.


Preprocessing Steps

Regridded Spatially

  • The ERA5 had the coarsest spatial resolution (2.5 x 2.5).
  • I regridded the NCEP from (0.25 x 0.25) to (2.5 x 2.5).
  • I regridded the CMIP5 from (2 x 2.5) to (2.5 x 2.5).

Temporal Resolution

  • ERA5 and NCEP go from 1980-2019
  • CMIP5 goes from 2006-2018
  • For comparing ERA5 vs CMIP5 and NCEP vs CMIPF, I found the same time components


Measures

I'm measuring the following:

  • Entropy - expected uncertainty
  • Total Correlation - amount of redundant information between features
  • Mutual Information - amount of information shared between variables


Data

Inputs

I'm taking each year as is. Each spatial location is a sample and each year is a feature. My inputs are:

X \in \mathbb{R}^{\text{spatial } \times \text{ month}}

Outputs

All my information theory measures are in nats. They are scalars.


Hypothesis

Simple: The ERA5 and the NCEP model should be more similar than the CMIP5 model compared to each of them.


Data - Climate Models

import os, sys
cwd = os.getcwd()
source_path = f"{cwd}/../../../"
sys.path.insert(0, f'{source_path}')


# ESDC tools
sys.path.insert(0, f'/home/emmanuel/code/py_esdc')
# from esdc.preprocessing import normalize_temporal

import cdsapi
from zipfile import ZipFile
import pandas as pd
import xarray as xr
from tqdm import tqdm
from sklearn import preprocessing

# Visualization Tools
from src.data.climate.loader import ResultsLoader
from src.visualization.climate import PlotResults

import seaborn as sns
import matplotlib.pyplot as plt
# plt.style.use('ggplot')
plt.style.use(['seaborn-notebook', 'fivethirtyeight'])

%matplotlib inline

%load_ext autoreload
%autoreload 2
data_path = f"/home/emmanuel/projects/2020_rbig_rs/data/climate/raw/"
results_path = f"/home/emmanuel/projects/2020_rbig_rs/data/climate/results/amip/"
fig_path = f"/home/emmanuel/projects/2020_rbig_rs/reports/figures/climate/"

Results

CMIP5 vs ERA5 vs NCEP

!ls "/home/emmanuel/projects/2020_rbig_rs/data/climate/results/amip/"
compare_trial_v1.csv         spatial_compare_v2.csv
global_compare_v2.csv         spatial_compare_v3.csv
global_compare_v3.csv         spatial_compare_v4.csv
global_individual_v2.csv      spatial_individual_trial_v1.csv
global_individual_v3.csv      spatial_individual_v2.csv
global_individual_v4.csv      spatial_individual_v3.csv
individual_trial_v1.csv       spatial_individual_v4.csv
spatial_compare_trial_v1.csv
results = pd.read_csv(results_path + 'spatial_individual_v4' + '.csv', index_col=[0])
columns = results.columns.tolist()
results.tail()
base base_time cmip cmip_time h_base h_cmip spatial subsample t_base t_cmip tc_base tc_cmip trial variable
20349 ncep 1999-07-01 inmcm4 1999-06-16 00:00:00 -77.516609 -60.457848 6.0 5000.0 21.345011 29.123130 120.900891 107.421496 6.0 psl
20350 ncep 1999-08-01 inmcm4 1999-07-16 12:00:00 -79.558294 -60.430310 6.0 5000.0 23.388674 31.234617 119.664003 108.125878 6.0 psl
20351 ncep 1999-09-01 inmcm4 1999-08-16 12:00:00 -77.975434 -62.943946 6.0 5000.0 20.472294 29.212621 117.276838 110.032793 6.0 psl
20352 ncep 1999-10-01 inmcm4 1999-09-16 00:00:00 -86.973429 -62.825051 6.0 5000.0 23.205492 39.465716 117.996714 107.935966 6.0 psl
20353 ncep 1999-11-01 inmcm4 1999-10-16 12:00:00 -82.277103 -66.988666 6.0 5000.0 26.059670 38.457328 120.735649 108.947965 6.0 psl
# results['dates'] = pd.to_datetime(results.base_time)
# results = results.groupby(results.dates.dt.year).mean().drop_duplicates()
# results.head()
def plot_spatial_entropy(results_df: pd.DataFrame, base: str, cmip: str, spatial: int, normalize=True)-> None:

    # choose spatial window size
    results_df = results_df[results['spatial'] == spatial]
    results_df = results_df[results_df['base'] == base]
    results_df = results_df[results_df['cmip'] == cmip]

    results_df['dates'] = pd.to_datetime(results_df.base_time)
    results_df = results_df.groupby(results_df.dates.dt.year).transform('mean').drop_duplicates()

    # normalize
    if normalize:
        results_df['h_base'] = results_df['h_base'] / results_df['spatial']
        results_df['h_cmip'] = results_df['h_cmip'] / results_df['spatial']
    sns.scatterplot(data=results_df, x='dates', y='h_base', label=f"{base}")
    sns.scatterplot(data=results_df, x='dates', y='h_cmip', label=f"{cmip}")
    plt.title(f'{base.upper()}, CMIP: {cmip.upper()}, Spatial: {int(spatial)}')
    plt.xlabel('Years')
    plt.ylabel('Entropy, H')
    plt.legend()
    plt.show()

Example

# plot_spatial_entropy(results, 'era5', 'inmcm4')
plot_spatial_entropy(results, 'ncep', 'inmcm4', 1.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 2.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 3.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 4.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 5.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 6.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 7.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 8.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 9.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 10.0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-2a6fd108413e> in <module>
      6 plot_spatial_entropy(results, 'ncep', 'inmcm4', 5.0)
      7 plot_spatial_entropy(results, 'ncep', 'inmcm4', 6.0)
----> 8 plot_spatial_entropy(results, 'ncep', 'inmcm4', 7.0)
      9 plot_spatial_entropy(results, 'ncep', 'inmcm4', 8.0)
     10 plot_spatial_entropy(results, 'ncep', 'inmcm4', 9.0)

<ipython-input-7-6a32b1ebac42> in plot_spatial_entropy(results_df, base, cmip, spatial, normalize)
      7 
      8     results_df['dates'] = pd.to_datetime(results_df.base_time)
----> 9     results_df = results_df.groupby(results_df.dates.dt.year).transform('mean').drop_duplicates()
     10 
     11     # normalize

~/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/pandas/core/groupby/generic.py in transform(self, func, *args, **kwargs)
    592         # nuiscance columns
    593         if not result.columns.equals(obj.columns):
--> 594             return self._transform_general(func, *args, **kwargs)
    595 
    596         return self._transform_fast(result, obj, func)

~/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
    564         concat_index = obj.columns if self.axis == 0 else obj.index
    565         other_axis = 1 if self.axis == 0 else 0  # switches between 0 & 1
--> 566         concatenated = concat(applied, axis=self.axis, verify_integrity=False)
    567         concatenated = concatenated.reindex(concat_index, axis=other_axis, copy=False)
    568         return self._set_result_index_ordered(concatenated)

~/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253         verify_integrity=verify_integrity,
    254         copy=copy,
--> 255         sort=sort,
    256     )
    257 

~/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    302 
    303         if len(objs) == 0:
--> 304             raise ValueError("No objects to concatenate")
    305 
    306         if keys is None:

ValueError: No objects to concatenate

With Trials

results = pd.read_csv(results_path + 'spatial_individual_v2' + '.csv', index_col=[0])
columns = results.columns.tolist()
results.tail()
base base_time cmip cmip_time h_base h_cmip spatial subsample t_base t_cmip tc_base tc_cmip trial variable
3604 ncep 1980-04-01 inmcm4 1980-03-16 12:00:00 -305.487809 -306.000709 2.0 100000.0 3131.432539 3195.500214 310.320863 310.573945 0.0 psl
3605 ncep 1980-05-01 inmcm4 1980-04-16 00:00:00 -305.418017 -306.468608 2.0 100000.0 3121.518930 3184.075421 310.309945 310.960205 0.0 psl
3606 ncep 1980-06-01 inmcm4 1980-05-16 12:00:00 -306.280263 -305.456243 2.0 100000.0 3131.040802 3193.071733 311.079259 309.870990 0.0 psl
3607 ncep 1980-07-01 inmcm4 1980-06-16 00:00:00 -304.751022 -306.033549 2.0 100000.0 3123.749256 3183.194890 309.773832 310.637945 0.0 psl
3608 ncep 1980-08-01 inmcm4 1980-07-16 12:00:00 -305.726799 -306.364675 2.0 100000.0 3131.382549 3194.702803 310.502920 311.400503 0.0 psl
# plot_spatial_entropy(results, 'era5', 'inmcm4')
plot_spatial_entropy(results, 'ncep', 'inmcm4', 1.0)
plot_spatial_entropy(results, 'ncep', 'inmcm4', 2.0)




plot_global_entropy(results, 'era5', 'access1_0')
plot_global_entropy(results, 'ncep', 'access1_0')
plot_global_entropy(results, 'era5', 'bcc_csm1_1')
plot_global_entropy(results, 'ncep', 'bcc_csm1_1')

Difference in Entropy

import numpy as np

def plot_global_diff_entropy(results_df: pd.DataFrame, base: str, normalized=True)-> None:


    results_copy = results_df[results_df['base'] == base]

    # calculate difference
    results_copy['h_diff'] = np.abs(results_copy['h_cmip'] - results_copy['h_base'])

    if normalized == True:
        results_copy['h_diff'] = results_copy['h_diff'] / results_copy['spatial']


    fig, ax = plt.subplots()
#     sns.scatterplot(ax=ax, data=results_copy, x='spatial', y='h_diff', hue='base', color='black')
    sns.scatterplot(ax=ax, data=results_copy, x='spatial', y='h_diff', hue='cmip')

    plt.title(f'')
    plt.xlabel('Spatial Features')
    plt.ylabel('Entropy, H')
    plt.legend(ncol=2, bbox_to_anchor=(2.05, 1), fontsize=16)
#     plt.tight_layout()
    plt.show()
plot_global_diff_entropy(results, 'era5')
/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':

Total Correlation

def plot_global_tc(results_df: pd.DataFrame, base: str, cmip: str)-> None:
    results_df = results_df[results_df['base'] == base]
    results_df = results_df[results_df['cmip'] == cmip]
    sns.scatterplot(data=results_df, x='spatial', y='tc_base', label=f"{base}")
    sns.scatterplot(data=results_df, x='spatial', y='tc_cmip', label=f"{cmip}")
    plt.title(f'{base.upper()}, CMIP: {cmip.upper()}')
    plt.xlabel('Spatial Features')
    plt.ylabel('Total Correlation, TC')
    plt.legend()
    plt.show()
plot_global_tc(results, 'era5', 'bcc_csm1_1')
plot_global_tc(results, 'ncep', 'bcc_csm1_1')
def plot_global_diff_tc(results_df: pd.DataFrame, base: str, normalized=True)-> None:


    results_copy = results_df[results_df['base'] == base]

    # calculate difference
    results_copy['tc_diff'] = np.abs(results_copy['tc_cmip'] - results_copy['tc_base'])

    if normalized == True:
        results_copy['tc_diff'] = results_copy['tc_diff'] / results_copy['spatial']


    fig, ax = plt.subplots()
#     sns.scatterplot(ax=ax, data=results_copy, x='spatial', y='h_diff', hue='base', color='black')
    sns.scatterplot(ax=ax, data=results_copy, x='spatial', y='tc_diff', hue='cmip')

    plt.title(f'')
    plt.xlabel('Spatial Features')
    plt.ylabel('Total Correlation, TC')
    plt.legend(ncol=2, bbox_to_anchor=(2.05, 1), fontsize=16)
#     plt.tight_layout()
    plt.show()

plot_global_diff_tc(results, 'era5', normalized=True)
plot_global_diff_tc(results, 'ncep', normalized=True)
/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.

Mutual Information

results = pd.read_csv(results_path + 'compare_trial_v1' + '.csv', index_col=[0])
results.head()
base cmip kendelltau mi pearson spatial spearman subsample time_mi variable
0 ncep inmcm4 0.552030 9.962479 0.833456 1.0 0.711300 10000.0 39.619016 psl
1 ncep inmcm4 0.568794 1.489607 0.842241 2.0 0.730925 10000.0 10.354173 psl
2 ncep inmcm4 0.581166 1.555495 0.844528 3.0 0.745527 10000.0 65.894330 psl
3 ncep inmcm4 0.562111 1.463338 0.803643 4.0 0.715848 10000.0 96.675056 psl
4 ncep inmcm4 0.529905 1.766060 0.751060 5.0 0.670227 10000.0 73.076941 psl
def plot_global_mi(results_df: pd.DataFrame, base: str, cmip: str, measure='mi', normalized=True)-> None:

    results_df = results_df[results_df['base'] == base]
    results_df = results_df[results_df['cmip'] == cmip]

    if normalized and measure == 'mi':
        results_df[measure] = results_df[measure] / results_df['spatial']
    sns.scatterplot(data=results_df, x='spatial', y=measure, label=f"{base}")
    sns.scatterplot(data=results_df, x='spatial', y=measure, label=f"{cmip}")
    plt.title(f'{base.upper()}, CMIP: {cmip.upper()}')
    plt.xlabel('Spatial Features')
    plt.ylabel('Mutual Information, MI')
    plt.legend()
    plt.show()
plot_global_mi(results, 'era5', 'inmcm4', 'mi')
plot_global_mi(results, 'era5', 'inmcm4', 'pearson')
plot_global_mi(results, 'era5', 'inmcm4', 'spearman')
plot_global_mi(results, 'era5', 'inmcm4', 'kendelltau')
plot_global_mi(results, 'era5', 'noresm1_m', 'mi')
plot_global_mi(results, 'era5', 'noresm1_m', 'pearson')
plot_global_mi(results, 'era5', 'noresm1_m', 'spearman')
plot_global_mi(results, 'era5', 'noresm1_m', 'kendelltau')
def plot_global_diff_mi(results_df: pd.DataFrame, base: str, measure='mi', normalized=True, log_mi=True)-> None:


    results_copy = results_df[results_df['base'] == base]


    if normalized == True:
        results_copy[measure] = results_copy[measure] / results_copy['spatial']

    if log_mi == True and measure == 'mi':
        results_copy[measure] = np.log(results_copy[measure])

    fig, ax = plt.subplots()
#     sns.scatterplot(ax=ax, data=results_copy, x='spatial', y='h_diff', hue='base', color='black')
    sns.scatterplot(ax=ax, data=results_copy, x='spatial', y=measure, hue='cmip')

    plt.title(f'')
    plt.xlabel('Spatial Features')
    plt.ylabel(f'{measure.upper()}')
    plt.legend(ncol=2, bbox_to_anchor=(2.05, 1), fontsize=16)
#     plt.tight_layout()
    plt.show()
plot_global_diff_mi(results, 'era5', 'mi', True)
plot_global_diff_mi(results, 'era5', 'pearson', True, False)
plot_global_diff_mi(results, 'era5', 'spearman', True, False)
plot_global_diff_mi(results, 'era5', 'kendelltau', True, False)
/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/home/emmanuel/.conda/envs/2019_rbig_ad/lib/python3.6/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()