Results - Round I¶

* J. Emmanuel Johnson * 11^th Nov, 2019

Recap¶

Recall that we are looking at different IT measures and how they compare when we look at drought years (2012, 2014, 2015) and non-drought years (2010, 2011, 2013). We vary the amount of temporal features that we are adding; i.e. we increase the number of previous time steps available for our samples. We can divide the IT measures we use into two groups:

Individual Measures - where we measure each variable independenly.
- Entropy - expected (average) amount of uncertainty
- Total Correlation - amount of redundant information within the features
Comparative Measures - where we compare multiple variables to one another
- Mutual Information - amount of shared information between two multivariate datasets.
- Pearsons Correlation Coefficient - the amount of correlation that can be found between two datasets.

The hypothesis would be:

Individual Measures - we see some trend that perhaps gives us intuition that there could be a 'sweet' spot for the amount of temporal dimensions to use.
Comparative Measures - the IT measures would exhibit a similar trend we saw for the individual measures but there should depend on the two variables we are comparing. For example the MI between SM and VOD should be higher than between SM and VOD.
Comparative Measures - the pearson correlation measures won't be so helpful in this case because it is a linear method that shouldn't do a good job at capturing the nonlinear variability/interactions that we expect.

Concerns¶

Calculating the Pearson coefficient

The pearson coeff isn't a multivariate measure (I don't think). So to do it for multi-dimensional data, I simply 'unraveled' the array such that I compared sample-to-sample and feature-to-feature. But I'm not sure if this is correct.

Climatology

I removed the climatology but I would like to see what happens when I don't remove the climatology. I don't think the results are much different between the non-drought and drought years. So I'm wondering if this has more of an affect.

Hypothesis

Should there be a difference between the drought and non-drought years? And have we sufficiently captured these differences with just 3 years each? (drought and non-drought)?

Code¶

import sys
sys.path.insert(0, '/home/emmanuel/projects/2019_rbig_ad/src')
sys.path.append('/home/emmanuel/code/py_esdc')
sys.path.append('/home/emmanuel/code/rbig')


# DataCube PreProcessing
from scipy.io import savemat, loadmat
import geopandas as geopd
from rasterio import features

# Main Libraries
import numpy as np
import scipy.io as scio
import xarray as xr
import pandas as pd
import seaborn as sns
from datetime import date
import time

# IT Algorithms
from rbig import RBIG, RBIGMI

# ML Preprocessing
from sklearn.preprocessing import normalize
from sklearn.model_selection import train_test_split
from scipy import signal

# Plotting
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use(['seaborn-poster'])
import tqdm

# Utilities
import warnings
warnings.simplefilter('ignore', category=FutureWarning)

# Notebook Specifics
%load_ext autoreload
%autoreload 2

plt.style.available

['seaborn-dark-palette',
 'classic',
 'ggplot',
 'seaborn-dark',
 'seaborn-pastel',
 'seaborn-bright',
 'seaborn-deep',
 'tableau-colorblind10',
 'seaborn-talk',
 'fast',
 'seaborn-ticks',
 'seaborn-white',
 'bmh',
 'fivethirtyeight',
 'seaborn-muted',
 '_classic_test',
 'grayscale',
 'seaborn-darkgrid',
 'seaborn-poster',
 'seaborn',
 'seaborn-whitegrid',
 'dark_background',
 'seaborn-paper',
 'seaborn-colorblind',
 'seaborn-notebook',
 'Solarize_Light2']

FIG_PATH = '/home/emmanuel/projects/2020_rbig_rs/reports/figures/drought/individual/'
DATA_PATH = '/home/emmanuel/projects/2020_rbig_rs/data/drought/results/'

datasets = [
    'exp_ind_v2.csv',
    'exp_group_v2.csv'
]

Experiment I - Individual Variables¶

data = pd.read_csv(DATA_PATH + datasets[0], index_col=[0])
data['drought'] = np.where(data['drought']==1, True, False)
data.head()

	drought	h	samples	tc	temporal	time	variable	year
0	False	1.405693	25779.0	0.000000	1.0	0.551456	VOD	2010.0
1	False	1.311283	25779.0	0.000000	1.0	0.557524	NDVI	2010.0
2	False	1.141273	25779.0	0.000000	1.0	0.547523	SM	2010.0
3	False	1.364679	25779.0	0.000000	1.0	0.547514	LST	2010.0
4	False	2.680166	24108.0	0.128393	2.0	1.434613	VOD	2010.0

Normalize¶

# normalize
data['h_norm'] = data['h'].div(data.temporal)
data['tc_norm'] = data['tc'].div(data.temporal)

Entropy¶

def plot_entropy(data, normalized=False, save=True, drought=True):
    fig, ax = plt.subplots(figsize=(10, 7))

    if drought == 'on':
        drought = 'drought'
        data = data[data['year'].isin([2012, 2014, 2015])]
        style = None
    elif drought == 'off':
        drought = 'nondrought'
        style = None
        data = data[data['year'].isin([2010, 2011, 2013])]
    elif drought == 'both':
        drought = 'both'
        style = 'drought'
    else:
        raise ValueError('Unrecognized drought state: ', drought)


    if normalized:
        y = 'h_norm'

    else:
        y = 'h'
    sns.lineplot(
        x="temporal", y=y, 
        hue='variable', 
        data=data,
        style=style,
        marker='o', 
    )
    ax.set_xlabel('Temporal Dims')
    ax.set_ylabel('Entropy')
    # plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
    plt.tight_layout()
    plt.show()
    if normalized and save:
        fig.savefig(f"{FIG_PATH}H_norm_individual_{drought}.png", frameon=False, )
    elif save:
        fig.savefig(f"{FIG_PATH}H_individual_{drought}.png", frameon=False, )

Drought Years vs Non-Drought Years¶

# plot_entropy(data, normalized=True, save=False, drought='on')
# plot_entropy(data, normalized=True, save=False, drought='off')
plot_entropy(data, normalized=True, save=False, drought='both')

Total Correlation¶

def plot_tc(data, normalized=False, save=True, drought=True):
    fig, ax = plt.subplots(figsize=(10, 7))

    if drought == 'on':
        drought = 'drought'
        data = data[data['year'].isin([2012, 2014, 2015])]
        style = None
    elif drought == 'off':
        drought = 'nondrought'
        style = None
        data = data[data['year'].isin([2010, 2011, 2013])]
    elif drought == 'both':
        drought = 'both'
        style = 'drought'
    else:
        raise ValueError('Unrecognized drought state: ', drought)

    if normalized:
        y = 'tc_norm'

    else:
        y = 'tc'
    sns.lineplot(
        x="temporal", y=y, 
        hue='variable', 
        data=data,
        style=style,
        marker='o', 
    )
    ax.set_xlabel('Temporal Dims')
    ax.set_ylabel('Total Correlation')
    # plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
    plt.tight_layout()
    plt.show()
    if normalized and save:
        fig.savefig(f"{FIG_PATH}TC_norm_individual_{drought}.png", frameon=False, )
    elif save:
        fig.savefig(f"{FIG_PATH}TC_individual_{drought}.png", frameon=False, )

Drought Years vs Non-Drought Years¶

# plot_tc(data, normalized=True, save=False, drought='on')
# plot_tc(data, normalized=True, save=False, drought='off')
plot_tc(data, normalized=True, save=False, drought='both')

Experiment II - Comparing Variables¶

data_group = pd.read_csv(DATA_PATH + datasets[1], index_col=[0])
data_group['drought'] = np.where(data_group['drought']==1, True, False)
data_group.head()

	drought	mi	pearson	samples	spearman	temporal	time	variable1	variable2	year
0	False	0.014735	0.064995	25779.0	0.083071	1.0	2.844188	VOD	NDVI	2010.0
1	False	0.024350	0.008711	25779.0	0.024544	1.0	2.887552	VOD	LST	2010.0
2	False	0.142746	0.150556	25779.0	0.290016	1.0	3.263144	VOD	SM	2010.0
3	False	0.019120	-0.107464	25779.0	-0.118719	1.0	2.836338	NDVI	LST	2010.0
4	False	0.059311	0.211504	25779.0	0.181002	1.0	2.830751	NDVI	SM	2010.0

Normalize¶

# normalize
data_group['mi_norm'] = data_group['mi'].div(data_group.temporal)

# cond1 = data_group['variable1'] == 'NDVI'
# cond2 = data_group['variable2'] == 'NDVI'
# data_group.loc[cond1 & cond2, ['variable1', 'variable2']] = data_group.loc[cond1 & cond2, ['variable2', 'variable1']].values

def move_variables(df: pd.DataFrame, variable: str)-> pd.DataFrame:
#     cond1 = df['variable1'] == variable
    cond = df['variable2'] == variable
    df.loc[
        cond, ['variable2', 'variable1']
    ] = df.loc[
        cond, ['variable1', 'variable2']
    ].values

    return df

df_new = move_variables(data_group, 'NDVI')

df_new.head()

	drought	mi	pearson	samples	spearman	temporal	time	variable1	variable2	year	mi_norm
0	False	0.014735	0.064995	25779.0	0.083071	1.0	2.844188	NDVI	VOD	2010.0	0.014735
1	False	0.024350	0.008711	25779.0	0.024544	1.0	2.887552	VOD	LST	2010.0	0.024350
2	False	0.142746	0.150556	25779.0	0.290016	1.0	3.263144	VOD	SM	2010.0	0.142746
3	False	0.019120	-0.107464	25779.0	-0.118719	1.0	2.836338	NDVI	LST	2010.0	0.019120
4	False	0.059311	0.211504	25779.0	0.181002	1.0	2.830751	NDVI	SM	2010.0	0.059311

Mutual Information¶

def plot_mutual_info(data, normalized=False, save=True, variable='VOD', drought=True):
    fig, ax = plt.subplots(figsize=(10, 7))

    if drought == 'on':
        drought = 'drought'
        data = data[data['year'].isin([2012, 2014, 2015])]
        style = None
    elif drought == 'off':
        drought = 'nondrought'
        style = None
        data = data[data['year'].isin([2010, 2011, 2013])]
    elif drought == 'both':
        drought = 'both'
        style = 'drought'
    else:
        raise ValueError('Unrecognized drought state: ', drought)

    # Select variable
    data = move_variables(data, variable)

    data = data[data['variable1'] == variable]
#     print(data.variable2)
    if normalized:
        y = 'mi_norm'

    else:
        y = 'mi'
    sns.lineplot(
        x="temporal", y=y, 
        hue='variable2', 
        data=data,
        style=style,
        marker='o', 
    )
    ax.set_xlabel('Temporal Dims')
    ax.set_ylabel('Mutual Information')
    # plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
    plt.tight_layout()
    plt.show()
    if normalized and save:
        fig.savefig(f"{FIG_PATH}MI_norm_individual_{drought}.png", frameon=False, )
    elif save:
        fig.savefig(f"{FIG_PATH}MI_individual_{drought}.png", frameon=False, )

def plot_pearson(data, normalized=False, save=True, variable='VOD', drought=True):
    fig, ax = plt.subplots(figsize=(10, 7))

    if drought == 'on':
        drought = 'drought'
        data = data[data['year'].isin([2012, 2014, 2015])]
        style = None
    elif drought == 'off':
        drought = 'nondrought'
        style = None
        data = data[data['year'].isin([2010, 2011, 2013])]
    elif drought == 'both':
        drought = 'both'
        style = 'drought'
    else:
        raise ValueError('Unrecognized drought state: ', drought)

    # Select variable
    data = move_variables(data, variable)

    data = data[data['variable1'] == variable]
#     print(data.variable2)
    if normalized:
        y = 'pearson'

    else:
        y = 'pearson'
    sns.lineplot(
        x="temporal", y=y, 
        hue='variable2',
        style=style,
        data=data,
        marker='o', 
    )
    ax.set_xlabel('Temporal Dims')
    ax.set_ylabel('Pearson')
    # plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
    plt.tight_layout()
    plt.show()
    if normalized and save:
        fig.savefig(f"{FIG_PATH}Pear_norm_individual_{drought}.png", frameon=False, )
    elif save:
        fig.savefig(f"{FIG_PATH}Pear_individual_{drought}.png", frameon=False, )

VOD¶

Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='on')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='on')

Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='off')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='off')

Drought and Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='both')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='both')

NDVI¶

Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='NDVI', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='NDVI', drought='on')

Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='NDVI', drought='both')
plot_pearson(data_group, normalized=True, save=True, variable='NDVI', drought='both')

LST¶

Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='LST', drought='on')

Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='off')
plot_pearson(data_group, normalized=True, save=True, variable='LST', drought='off')

Drought and Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='both')
plot_pearson(data_group, normalized=False, save=True, variable='LST', drought='both')

SM¶

Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='on')

Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='off')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='off')

Drought and Non-Drought Years¶

plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='both')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='both')