Results - Round I¶
Recap¶
Recall that we are looking at different IT measures and how they compare when we look at drought years (2012, 2014, 2015) and non-drought years (2010, 2011, 2013). We vary the amount of temporal features that we are adding; i.e. we increase the number of previous time steps available for our samples. We can divide the IT measures we use into two groups:
- Individual Measures - where we measure each variable independenly.
- Entropy - expected (average) amount of uncertainty
- Total Correlation - amount of redundant information within the features
- Comparative Measures - where we compare multiple variables to one another
- Mutual Information - amount of shared information between two multivariate datasets.
- Pearsons Correlation Coefficient - the amount of correlation that can be found between two datasets.
The hypothesis would be:
- Individual Measures - we see some trend that perhaps gives us intuition that there could be a 'sweet' spot for the amount of temporal dimensions to use.
- Comparative Measures - the IT measures would exhibit a similar trend we saw for the individual measures but there should depend on the two variables we are comparing. For example the MI between SM and VOD should be higher than between SM and VOD.
- Comparative Measures - the pearson correlation measures won't be so helpful in this case because it is a linear method that shouldn't do a good job at capturing the nonlinear variability/interactions that we expect.
Concerns¶
- Calculating the Pearson coefficient
The pearson coeff isn't a multivariate measure (I don't think). So to do it for multi-dimensional data, I simply 'unraveled' the array such that I compared sample-to-sample and feature-to-feature. But I'm not sure if this is correct.
- Climatology
I removed the climatology but I would like to see what happens when I don't remove the climatology. I don't think the results are much different between the non-drought and drought years. So I'm wondering if this has more of an affect.
- Hypothesis
Should there be a difference between the drought and non-drought years? And have we sufficiently captured these differences with just 3 years each? (drought and non-drought)?
Code¶
import sys
sys.path.insert(0, '/home/emmanuel/projects/2019_rbig_ad/src')
sys.path.append('/home/emmanuel/code/py_esdc')
sys.path.append('/home/emmanuel/code/rbig')
# DataCube PreProcessing
from scipy.io import savemat, loadmat
import geopandas as geopd
from rasterio import features
# Main Libraries
import numpy as np
import scipy.io as scio
import xarray as xr
import pandas as pd
import seaborn as sns
from datetime import date
import time
# IT Algorithms
from rbig import RBIG, RBIGMI
# ML Preprocessing
from sklearn.preprocessing import normalize
from sklearn.model_selection import train_test_split
from scipy import signal
# Plotting
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use(['seaborn-poster'])
import tqdm
# Utilities
import warnings
warnings.simplefilter('ignore', category=FutureWarning)
# Notebook Specifics
%load_ext autoreload
%autoreload 2
plt.style.available
FIG_PATH = '/home/emmanuel/projects/2020_rbig_rs/reports/figures/drought/individual/'
DATA_PATH = '/home/emmanuel/projects/2020_rbig_rs/data/drought/results/'
datasets = [
'exp_ind_v2.csv',
'exp_group_v2.csv'
]
Experiment I - Individual Variables¶
data = pd.read_csv(DATA_PATH + datasets[0], index_col=[0])
data['drought'] = np.where(data['drought']==1, True, False)
data.head()
Normalize¶
# normalize
data['h_norm'] = data['h'].div(data.temporal)
data['tc_norm'] = data['tc'].div(data.temporal)
Entropy¶
def plot_entropy(data, normalized=False, save=True, drought=True):
fig, ax = plt.subplots(figsize=(10, 7))
if drought == 'on':
drought = 'drought'
data = data[data['year'].isin([2012, 2014, 2015])]
style = None
elif drought == 'off':
drought = 'nondrought'
style = None
data = data[data['year'].isin([2010, 2011, 2013])]
elif drought == 'both':
drought = 'both'
style = 'drought'
else:
raise ValueError('Unrecognized drought state: ', drought)
if normalized:
y = 'h_norm'
else:
y = 'h'
sns.lineplot(
x="temporal", y=y,
hue='variable',
data=data,
style=style,
marker='o',
)
ax.set_xlabel('Temporal Dims')
ax.set_ylabel('Entropy')
# plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
plt.tight_layout()
plt.show()
if normalized and save:
fig.savefig(f"{FIG_PATH}H_norm_individual_{drought}.png", frameon=False, )
elif save:
fig.savefig(f"{FIG_PATH}H_individual_{drought}.png", frameon=False, )
Drought Years vs Non-Drought Years¶
# plot_entropy(data, normalized=True, save=False, drought='on')
# plot_entropy(data, normalized=True, save=False, drought='off')
plot_entropy(data, normalized=True, save=False, drought='both')
Total Correlation¶
def plot_tc(data, normalized=False, save=True, drought=True):
fig, ax = plt.subplots(figsize=(10, 7))
if drought == 'on':
drought = 'drought'
data = data[data['year'].isin([2012, 2014, 2015])]
style = None
elif drought == 'off':
drought = 'nondrought'
style = None
data = data[data['year'].isin([2010, 2011, 2013])]
elif drought == 'both':
drought = 'both'
style = 'drought'
else:
raise ValueError('Unrecognized drought state: ', drought)
if normalized:
y = 'tc_norm'
else:
y = 'tc'
sns.lineplot(
x="temporal", y=y,
hue='variable',
data=data,
style=style,
marker='o',
)
ax.set_xlabel('Temporal Dims')
ax.set_ylabel('Total Correlation')
# plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
plt.tight_layout()
plt.show()
if normalized and save:
fig.savefig(f"{FIG_PATH}TC_norm_individual_{drought}.png", frameon=False, )
elif save:
fig.savefig(f"{FIG_PATH}TC_individual_{drought}.png", frameon=False, )
Drought Years vs Non-Drought Years¶
# plot_tc(data, normalized=True, save=False, drought='on')
# plot_tc(data, normalized=True, save=False, drought='off')
plot_tc(data, normalized=True, save=False, drought='both')
Experiment II - Comparing Variables¶
data_group = pd.read_csv(DATA_PATH + datasets[1], index_col=[0])
data_group['drought'] = np.where(data_group['drought']==1, True, False)
data_group.head()
Normalize¶
# normalize
data_group['mi_norm'] = data_group['mi'].div(data_group.temporal)
# cond1 = data_group['variable1'] == 'NDVI'
# cond2 = data_group['variable2'] == 'NDVI'
# data_group.loc[cond1 & cond2, ['variable1', 'variable2']] = data_group.loc[cond1 & cond2, ['variable2', 'variable1']].values
def move_variables(df: pd.DataFrame, variable: str)-> pd.DataFrame:
# cond1 = df['variable1'] == variable
cond = df['variable2'] == variable
df.loc[
cond, ['variable2', 'variable1']
] = df.loc[
cond, ['variable1', 'variable2']
].values
return df
df_new = move_variables(data_group, 'NDVI')
df_new.head()
Mutual Information¶
def plot_mutual_info(data, normalized=False, save=True, variable='VOD', drought=True):
fig, ax = plt.subplots(figsize=(10, 7))
if drought == 'on':
drought = 'drought'
data = data[data['year'].isin([2012, 2014, 2015])]
style = None
elif drought == 'off':
drought = 'nondrought'
style = None
data = data[data['year'].isin([2010, 2011, 2013])]
elif drought == 'both':
drought = 'both'
style = 'drought'
else:
raise ValueError('Unrecognized drought state: ', drought)
# Select variable
data = move_variables(data, variable)
data = data[data['variable1'] == variable]
# print(data.variable2)
if normalized:
y = 'mi_norm'
else:
y = 'mi'
sns.lineplot(
x="temporal", y=y,
hue='variable2',
data=data,
style=style,
marker='o',
)
ax.set_xlabel('Temporal Dims')
ax.set_ylabel('Mutual Information')
# plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
plt.tight_layout()
plt.show()
if normalized and save:
fig.savefig(f"{FIG_PATH}MI_norm_individual_{drought}.png", frameon=False, )
elif save:
fig.savefig(f"{FIG_PATH}MI_individual_{drought}.png", frameon=False, )
def plot_pearson(data, normalized=False, save=True, variable='VOD', drought=True):
fig, ax = plt.subplots(figsize=(10, 7))
if drought == 'on':
drought = 'drought'
data = data[data['year'].isin([2012, 2014, 2015])]
style = None
elif drought == 'off':
drought = 'nondrought'
style = None
data = data[data['year'].isin([2010, 2011, 2013])]
elif drought == 'both':
drought = 'both'
style = 'drought'
else:
raise ValueError('Unrecognized drought state: ', drought)
# Select variable
data = move_variables(data, variable)
data = data[data['variable1'] == variable]
# print(data.variable2)
if normalized:
y = 'pearson'
else:
y = 'pearson'
sns.lineplot(
x="temporal", y=y,
hue='variable2',
style=style,
data=data,
marker='o',
)
ax.set_xlabel('Temporal Dims')
ax.set_ylabel('Pearson')
# plt.legend(['NDVI', 'LST', 'SM', 'VOD'])
plt.tight_layout()
plt.show()
if normalized and save:
fig.savefig(f"{FIG_PATH}Pear_norm_individual_{drought}.png", frameon=False, )
elif save:
fig.savefig(f"{FIG_PATH}Pear_individual_{drought}.png", frameon=False, )
VOD¶
Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='on')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='on')
Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='off')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='off')
Drought and Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='VOD', drought='both')
plot_pearson(data_group, normalized=False, save=True, variable='VOD', drought='both')
NDVI¶
Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='NDVI', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='NDVI', drought='on')
Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='NDVI', drought='both')
plot_pearson(data_group, normalized=True, save=True, variable='NDVI', drought='both')
LST¶
Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='LST', drought='on')
Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='off')
plot_pearson(data_group, normalized=True, save=True, variable='LST', drought='off')
Drought and Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='LST', drought='both')
plot_pearson(data_group, normalized=False, save=True, variable='LST', drought='both')
SM¶
Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='on')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='on')
Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='off')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='off')
Drought and Non-Drought Years¶
plot_mutual_info(data_group, normalized=True, save=True, variable='SM', drought='both')
plot_pearson(data_group, normalized=True, save=True, variable='SM', drought='both')