TensorFlow 2.0¶
Author: J. Emmanuel Johnson Email: jemanjohnson34@gmail.com
These are notes that I took based off of lectures 1 and 2 given by Francois Chollet.
Architecture¶
1. Engine Module¶
This is basically the model definition. It has the following parts
Layer
Network
- this contains the DAG of Layers (internal component)Model
- this contains the network and is used to do the training and evaluation loopsSequential
- wraps a list of layers
2 Various Classes (and subclasses)¶
- Layers
- Metric
- Loss
- Callback
- Optimizer
- Regularizers, Constraints?
Layer
Class¶
This is the core abstraction in the API. Everything is a Layer
or it at least interacts closely with the Layer
.
What can it do?¶
Computation
This manages the computation. It takes in batch inputs / batch outputs.
- Assumes no interactions between samples
- Eager or Graph execution
Training
andInference
model- Masking (e.g. time series, missing features)
Manages State
This keeps track of what's trainable or not trainable.
class Linear(tf.keras.Layer):
def __init__(self):
super().__init__()
self.weights = ...trainable
self.bias = ...not trainable
Track Losses & Metrics
Up
class Linear(tf.keras.Layer):
def call(self, x):
# calculate kl divergence
kl_loss = ...
# add loss
self.add_loss(...)
- Type Checking
- Frozen or UnFrozen (fine-tuning, batch-norm, GANS)
- Can build DAGs - Sequential Form
- Mixed Precio
What do they not do?¶
- Gradients
- Device Placement
- Distribution-specific logic
- Only batch-wise computation.
Basic Layer¶
We are going to create a base layer
# create linear layer
class Linear(tf.keras.Layer):
def __init__(self, units=32, input_dim=32):
super().__init__()
# weights variable
w_init = tf.random_normal_initializer()(shape=(input_dim, units))
self.w = tf.Variable(
initial_value=w_init,
trainable=True
)
# bias parameter
b_init = tf.zeros_initializer()(shape=(units,))
self.b = tf.Variable(
initial_value=b_init,
trainable=True
)
def call(self, inputs):
return tf.matmul(inputs, self.w) + b
# data
x_train = tf.ones(2, 2)
# initialize linear layer
linear_layer = Linear(4, input_dim=2)
# same thing as linear_layer.call(x)
y = linear_layer(x)
Better Basic Layer¶
class Linear(tf.keras.Layer):
def __init__(self, units=32, **kwargs):
super().__init__()
self.units = units
Notice how we didn't construct the weights when we initialized the class (constructor). This is nice because now we can construct our layer without having to know what the input dimension will be. We can simply specify the units. Instead we create a build
method and that has the weights specified.
def build(self, input_shape):
# Weights variable
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer='random_normal',
trainable=True
)
# Bias variable
self.b = self.add_weight(
shape=(self.units,),
initializer='zeros',
trainable=True
)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
The rest doesn't change. We can initialize the liner layer just with the units. This is called 'Lazy loading'
linear_layer = Linear(32)
It will call .build(x.shape)
to get the dimensions of the dataset.
y = linear_layer(x)
Nested Layers¶
We can nest Layers
(as many) layers as we want actually. For example:
Multi-Layer Perceptron¶
class MLP(Layer):
def __init__(self, units=32):
super().__init__()
self.linear = Linear(units)
def call(self, inputs):
x = self.linear(inputs)
return x
MLP Block¶
class MLPB(Layer):
def __init__(self):
super().__init__()
self.mlp_1 = MLPBlock(32)
self.mlp_2 = MLPBlock(32)
self.mlp_3 = MLPBlock(1)
def call(self, inputs):
x = self.mlp_1(x)
x = self.mlp_2(x)
return x = self.mlp_3(x)
Basic Training¶
So assuming that we have our linear layer, we can do some basic training procedure.
# initialize model
lr_model = Linear(32)
# loss function
loss_fn = tf.keras.losses.MSELoss()
# optimizer
optimizer = tf.keras.optimizers.Adam()
# Loop through dataset
for x, y in dataset:
with tf.GradientTape() as tape:
# predictions for minibatch
preds = linear_model(x)
# loss value for minibatch
loss = loss_fn(y, preds)
# find gradients
grads = tape.gradients(loss, lr_model.trainable_weights)
# apply optimization
optimizer.apply_gradients(zip(grads, lr_model.trainable_weights))
Losses¶
We can add losses on the fly. For example, we can add a small activation regularizer in the call function for the MLP layer that we made above:
class MLP(Layer):
def __init__(self, units=32, reg=1e-3):
super().__init__()
self.linear = Linear(units)
self.reg = reg
def call(self, inputs):
x = self.linear(inputs)
x = tf.nn.relu(x)
self.add_loss(tf.reduce_sum(output ** 2) * self.reg)
return x
Now when we call the layer, we get the activation loss.
mlp_layer = MLP(32)
y = mlp_layer(x)
Now it gets reset everytime we call it.
Modified Training Loop¶
mlp_model = MLP(32) # initialize model
loss_fn = tf.keras.losses.MSELoss() # loss function
optimizer = tf.keras.optimizers.Adam() # optimizer
# Loop through dataset
for x, y in dataset:
with tf.GradientTape() as tape:
preds = mlp_model(x) # predictions for minibatch
loss = loss_fn(y, preds) # loss value for minibatch
loss += sum(mlp_model.losses) # extra losses from forward pass
# find gradients
grads = tape.gradients(loss, mlp_model.trainable_weights)
# apply optimization
optimizer.apply_gradients(zip(grads, mlp_model.trainable_weights))
Useful for:
- KL-Divergence
- Weight Regularization
- Activation Regularization
Note: There is some context. The inner layers are also reset when their parent layer is called.
Serialization¶
class Linear(tf.keras.Layer):
def __init__()
...
def get_config(self):
config super().get_config()
config.update({'units': self.units})
return config
Training Mode¶
Allows you to do training versus inference mode. You simply need to add an extra argument in the cal()
method.
...
def call(self, x, training=True):
if training:
# do training stuff
else:
# do inference stuff
return x
Some good examples:
- Batch Normalization
- Probabilistic Models (MC Variational Inference)
Model
Class¶
This handles top-level functionality. The Model
class does everything the Layer
class can do, i.e. it is the same except with more available methods. In the literature, we refer to this as a "model", e.g. a deep learning model, a machine learning model, or as a "network", e.g. a deep neural network.
In the literature, we refer to a Layer
as something with a closed sequence of operations. For example a convolutional layer or a recurrent layer. Sometimes we also refer layers within layers as a block. For example a ResNet block or an Attention block.
So ultimately, you would define the Layer
class to do the inner computation blocks and the Model
class to do the outer model with what you do to train and save.
Training functionality
.compile()
.fit()
.evaulate()
.predict()
Saving
We have the .save()
method which includes:
- configuration (topology)
- state (weights)
- optimiser
Summarization & Visualization
.summary()
plot_model()
Compile¶
This option give configurations:
- optimizer
- Loss
When you have the model class and you run .compile()
, you are running the graph in graph execution model. So you are basically compiling the graph. If we want to run it eagerly: we need to set the paramter run_eagerly
to be True
.
mlp = MLP()
mlp.compile(optimizer=Adam(), loss=MSELoss(),run_eagerly=True)
Fit¶
How the data will be fit: The training procedure.
- Callbacks
- Data
- Epochs