Droughts - Pre-Processing¶
In this notebook, I will be going over the preprocessing steps needed before starting the experiments. I will include the following steps:
- Load Data
- Select California
- Fill NANs
- Smoothing of the VOD signal (savgol filter)
- Removing the climatology
- Select drought years and non-drought years
- Extract density cubes
Code¶
import sys, os
cwd = os.getcwd()
sys.path.insert(0, f'{cwd}/../../')
sys.path.insert(0, '/home/emmanuel/code/py_esdc')
import xarray as xr
import pandas as pd
import numpy as np
# drought tools
from src.data.drought.loader import DataLoader
from src.features.drought.build_features import (
get_cali_geometry,
mask_datacube,
smooth_vod_signal,
remove_climatology,
get_cali_emdata,
get_drought_years,
get_density_cubes,
get_common_elements,
normalize
)
from src.visualization.drought.analysis import plot_mean_time
# esdc tools
from esdc.subset import select_pixel
from esdc.shape import ShapeFileExtract, rasterize
from esdc.transform import DensityCubes
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
plt.style.use(['fivethirtyeight', 'seaborn-poster'])
%matplotlib inline
%load_ext autoreload
%autoreload 2
1. Load Data¶
region = 'conus'
sampling = '14D'
drought_cube = DataLoader().load_data(region, sampling)
pixel = (-121, 37)
drought_cube
plot_mean_time(
drought_cube.LST.sel(time=slice('June-2010', 'June-2010'))
)
2. Subset California¶
# get california polygon
cali_geoms = get_cali_geometry()
# get california cube subset
cali_cube = mask_datacube(drought_cube, cali_geoms)
plot_mean_time(
cali_cube.LST.sel(time=slice('June-2011', 'June-2011'))
)
3. Interpolate NANs - Time Dimension¶
# interpolation arguments
interp_dim = 'time'
method = 'linear'
# do interpolation
cali_cube_interp = cali_cube.interpolate_na(
dim=interp_dim,
method=method
)
4. Smoothing the Signal (VOD)¶
In this section, we will try to smooth the signal with two methods:
- Simple - Rolling mean
- Using a savgol filter.
Some initial parameters:
- Window Size = 5
- Polynomial Order = 3
We will apply this filter in the time domain only.
vod_data = cali_cube_interp.VOD
vod_data
4.1 - Savgol Filter¶
from scipy.signal import savgol_filter
# select example
vod_data_ex = select_pixel(vod_data, pixel)
# savgol filter params
window_length = 5
polyorder = 3
# apply savgol filter
vod_smooth_filter = savgol_filter(
vod_data_ex,
window_length=window_length,
polyorder=polyorder
)
fig, ax = plt.subplots(nrows=2, figsize=(10, 10))
ax[0].plot(vod_data_ex)
ax[0].set_title('Original Data')
ax[1].plot(vod_smooth_filter)
ax[1].set_title('After Savgol Filter')
plt.show()
4.2 - Rolling Window¶
# select example
vod_data_ex = select_pixel(vod_data, pixel)
# savgol filter params
window_length = 2
# apply savgol filter
vod_smooth_roll = vod_data_ex.rolling(
time=window_length,
center=True
).mean()
fig, ax = plt.subplots(nrows=2, figsize=(10, 10))
ax[0].plot(vod_data_ex)
ax[0].set_title('Original Data')
ax[1].plot(vod_smooth_roll)
ax[1].set_title('After Rolling Mean')
plt.show()
4.3 - Difference¶
vod_smooth_diff = vod_smooth_filter - vod_smooth_roll
fig, ax = plt.subplots(nrows=4,figsize=(10,10))
ax[0].plot(vod_data_ex)
ax[0].set_title('Original')
ax[1].plot(vod_smooth_filter)
ax[1].set_title('Savgol Filter')
ax[2].plot(vod_smooth_roll)
ax[2].set_title('Rolling Mean')
ax[3].plot(vod_smooth_diff)
ax[3].set_title('Difference')
# Scale the Difference Y-Limits
ymax = np.max([vod_smooth_filter.max(), vod_smooth_roll.max()])
ymin = np.min([vod_smooth_filter.min(), vod_smooth_roll.min()])
center = (ymax - ymin)
ymax = ymax - center
ymin = center - ymin
ax[3].set_ylim([0 - ymin, 0 + ymax])
plt.tight_layout()
plt.show()
4.3 - Apply Rolling Mean to the whole dataset¶
cali_cube_interp = smooth_vod_signal(cali_cube_interp, window_length=2, center=True)
5. Remove Climatology¶
When I mean 'climatology', I mean the difference between observations and typical weather for a particular season. The anomalies should not show up in the seasonal cycle. I'll just do a very simple removal. I'll calculate the monthly mean wrt time and then remove that from each month from the original datacube.
Steps
- Climatalogy - Monthly Mean for the 6 years
- Remove Climatology - Climatology from each month
# remove climatology
cali_anomalies, cali_mean = remove_climatology(cali_cube_interp)
variables = ['LST', 'VOD', 'NDVI', 'SM']
for ivariable in variables:
fig, ax = plt.subplots(nrows=3, figsize=(10, 10))
# Before Climatology
select_pixel(cali_cube_interp[ivariable], pixel).plot(ax=ax[0])
ax[0].set_title('Original Time Series')
# Climatology
select_pixel(cali_mean[ivariable], pixel).plot(ax=ax[1])
ax[1].set_title('Climatology')
# After Climatology
select_pixel(cali_anomalies[ivariable], pixel).plot(ax=ax[2])
ax[2].set_title('After Climatology Median Removed')
plt.tight_layout()
plt.show()
6. EMData¶
I extract the dates for the drought events for california. This will allow me to separate the drought years and non-drought years.
cali_droughts = get_cali_emdata()
cali_droughts
Drought Years
- 2012
- 2014
- 2015
Non-Drought Years
- 2010
- 2011
- 2013
Note: Even though the EM-Data says that the drought year for 2012 is only half a year, we're going to say that that is a full year.
# Drought Years
cali_anomalies_drought = get_drought_years(
cali_anomalies, ['2012', '2014', '2015']
)
# Non-Drought Years
cali_anomalies_nondrought = get_drought_years(
cali_anomalies, ['2010', '2011', '2013']
)
7. Extract Density Cubes¶
In this step, we will construct 'density cubes'. These are cubes where we add features from a combination of the spatial and/or temporal dimensions. Instead of a single sample, we have a sample that takes into account spatial and/or temporal information. In this experiment, we will only look at temporal information. Our temporal resolution is 14 Days and we want to look at a maximum of 6 months.
So:
\Bigg\lfloor \frac{6 \: months}{\frac{14\: days}{30 \: days} \:\times 1 \: month} \Bigg\rfloor = 12 \: time \: stamps
# confirm
sub_ = cali_anomalies_drought.isel(time=slice(0,12))
sub_.time[0].data, sub_.time[-1].data
7.1 - Example Density Cube¶
# window sizes
spatial = 1
time = 12
vod_df, lst_df, ndvi_df, sm_df = get_density_cubes(cali_anomalies_drought, spatial, time)
vod_df.shape, lst_df.shape, ndvi_df.shape, sm_df.shape
8. Find Common Elements¶
Notice how the number of elements is different depending upon the dataset. I believe it is the case that there are less elements for the VOD and the SM datasets. To make a fair comparison, I'll be using only the common elements between the two density cubes.
Note: This is also a bit difficult for RBIG to calculate the Mutual Information for datasets that are potentially so different in their domain.
vod_df, lst_df = get_common_elements(vod_df, lst_df)
vod_df.shape, lst_df.shape