For example, let’s say I want to estimate temperature given humidity.
I assume there is some model that can do this for me.
Temperature = Model ( Humidity , Parameters )
\text{Temperature} = \text{Model}\left(\text{Humidity}, \text{Parameters}\right) Temperature = Model ( Humidity , Parameters ) The model could be a statistical model or a physics-based model.
However, I don’t really know the model or the parameters of the model.
So I need to get some observations of temperature and humidity
Data = { Temperature , Humidity }
\text{Data} = \left\{ \text{Temperature}, \text{Humidity}\right\} Data = { Temperature , Humidity } I -Learning Problem ¶ Data ¶ Let’s say that I want to model the joint distribution of Temperature, and Humidity.
First, I need to collect some observations of temperature and humidity.
Data : D = { x n , y n } n = 1 N \begin{aligned}
\text{Data}: && &&
\mathcal{D} &= \{ x_n, y_n \}_{n=1}^N
\end{aligned} Data : D = { x n , y n } n = 1 N # get data
x: Vector["N Dx"] = get_covariates(...)
y: Vector["N"] = get_observations(...)Model ¶ Now, I assume a model.
Let’s assume that I can perfectly model my temperature observations via a Gaussian distribution.
Model : y = f ( x ; θ ) + ε , ε ∼ N ( 0 , σ 2 ) \begin{aligned}
\text{Model}: && &&
y &= \boldsymbol{f}(x;\boldsymbol{\theta}) + \varepsilon, &&
\varepsilon \sim \mathcal{N}(0, \sigma^2)
\end{aligned} Model : y = f ( x ; θ ) + ε , ε ∼ N ( 0 , σ 2 ) Now, to translate this into a probabilistic interpretation, we can write this as a likelihood.
Data Likelihood : y ∼ p ( y ∣ θ ) = N ( y ∣ μ , σ 2 ) \begin{aligned}
\text{Data Likelihood}: && &&
y &\sim p(y|\theta) =\mathcal{N}(y|\mu,\sigma^2)
\end{aligned} Data Likelihood : y ∼ p ( y ∣ θ ) = N ( y ∣ μ , σ 2 ) def observation_model(x: Array["N Dx"], f: Callable, params: PyTree) -> Model:
# extract parameters
mu = f(x, params)
sigma = params["sigma"]
# initialize Gaussian
model = Gaussian(mu, sigma)
return modelFunction ¶ Now, we have this arbitrary function, f f f , where we have not defined.
We are free to pick whatever we want and this is the “fun” part of data-driven modeling.
We could
Linear : f ( x ; θ ) = w x + b Basis : f ( x ; θ ) = w ϕ ( x ; α ) + b Analytic : f ( x ; θ ) = Analytic ( x ; θ ) Neural Network : f ( x ; θ ) = NN ( x ; θ ) \begin{aligned}
\text{Linear}: && &&
\boldsymbol{f}(x;\boldsymbol{\theta}) &= \mathbf{wx} + \mathbf{b} \\
\text{Basis}: && &&
\boldsymbol{f}(x;\boldsymbol{\theta}) &= \mathbf{w}\boldsymbol{\phi}(\mathbf{x};\alpha) + \mathbf{b} \\
\text{Analytic}: && &&
\boldsymbol{f}(x;\boldsymbol{\theta}) &= \text{Analytic}(\mathbf{x};\boldsymbol{\theta}) \\
\text{Neural Network}: && &&
\boldsymbol{f}(x;\boldsymbol{\theta}) &= \text{NN}(\mathbf{x};\boldsymbol{\theta}) \\
\end{aligned} Linear : Basis : Analytic : Neural Network : f ( x ; θ ) f ( x ; θ ) f ( x ; θ ) f ( x ; θ ) = wx + b = w ϕ ( x ; α ) + b = Analytic ( x ; θ ) = NN ( x ; θ ) def linear_model(x: Array["N Dx"], params) -> Array["N"]:
# extract parameters
weights = params["weights"]
biases = params["biases"]
# calculate mean
mu = weights @ x + biases
return muSo in this case, we see that our parameters are the mean and standard deviation
θ = { μ , σ } \theta = \left\{\mu, \sigma \right\} θ = { μ , σ } Now, we can also put a prior on the parameters
θ ∼ Uniform [ − ∞ , ∞ ] \theta \sim \text{Uniform}[-\infty,\infty] θ ∼ Uniform [ − ∞ , ∞ ] Joint Distribution : p ( y , θ ) = p ( y ∣ θ ) p ( θ ) Posterior : p ( θ ∣ D ) ∝ p ( y ∣ θ ) p ( θ ) \begin{aligned}
\text{Joint Distribution}: && &&
p(y,\theta) &= p(y|\theta)p(\theta) \\
\text{Posterior}: && &&
p(\theta|\mathcal{D}) &\propto p(y|\theta)p(\theta)
\end{aligned} Joint Distribution : Posterior : p ( y , θ ) p ( θ ∣ D ) = p ( y ∣ θ ) p ( θ ) ∝ p ( y ∣ θ ) p ( θ ) Criteria ¶ To get a criteria, there is a general form that one could use.
However, we will be Bayesian about it.
We are interested in the posterior, i.e., we want the best parameters given our data.
Posterior : p ( θ ∣ D ) ∝ p ( y ∣ x , θ ) p ( θ ) = exp ( − L ( θ ; y , x ) )
\begin{aligned}
\text{Posterior}: && &&
p(\theta|\mathcal{D}) &\propto p(y|x,\theta)p(\theta)=\exp(-L(\theta;y,x))
\end{aligned} Posterior : p ( θ ∣ D ) ∝ p ( y ∣ x , θ ) p ( θ ) = exp ( − L ( θ ; y , x )) Because we are in Bayesian territory, we can use the MLE estimation
Objective Function : log p ( θ ∣ D ) = − L ( θ ; y ) = p ( y ∣ x , θ ) + p ( θ ) \begin{aligned}
\text{Objective Function}: && &&
\log p(\theta|\mathcal{D}) &=
-L(\theta;y) = p(y|x,\theta) + p(\theta)
\end{aligned} Objective Function : log p ( θ ∣ D ) = − L ( θ ; y ) = p ( y ∣ x , θ ) + p ( θ ) def objective_fn(params: PyTree, x: Array["N Dx"], y: Vector["N"]) -> Scalar:
# initialize model
model = observation_model(x, linear_function, params)
# calculate log probability from observations
loss = log_probability(model, y)
# return loss
return lossInference Method ¶ Now we can minimize our objective
Objective : θ ∗ = argmin θ L ( θ ; y ) \begin{aligned}
\text{Objective}: && &&
\theta^* &= \underset{\theta}{\text{argmin}} \hspace{2mm}
L(\theta;y)
\end{aligned} Objective : θ ∗ = θ argmin L ( θ ; y ) # initialize parameters
params_init: PyTree = ...
num_iterations: int = 1_000
# optimize parameters
params = minimize_objective(
objective_fn,
params_init, x, y,
num_iterations
)II - Estimation Problem ¶ Data ¶ Now, let’s say we get some new observations of temperature
New Data : D ′ = { y n ′ } n = 1 N t e s t
\begin{aligned}
\text{New Data}: && &&
\mathcal{D}' &= \left\{y_n'\right\}_{n=1}^{N_{test}}
\end{aligned} New Data : D ′ = { y n ′ } n = 1 N t es t Model ¶ So in this case, I believe that the new parameters is some new combination of the older parameters.
So I’m effectively looking for the change in parameters.
u ∼ p ( u ∣ θ ) u \sim p(u|\theta) u ∼ p ( u ∣ θ ) Criteria ¶ Now, we are interested in estimating
Objective : θ ∗ = argmin θ J ( θ ; y ) Objective Function : J ( θ ; y ) = p ( y ∣ θ ) p ( θ ) = p ( y ∣ θ ) p ( θ ∣ D )
\begin{aligned}
\text{Objective}: && &&
\theta^* &= \underset{\theta}{\text{argmin}} \hspace{2mm}
J(\theta;y)\\
\text{Objective Function}: && &&
J(\theta;y) &= p(y|\theta)p(\theta) = p(y|\theta)p(\theta|\mathcal{D})
\end{aligned} Objective : Objective Function : θ ∗ J ( θ ; y ) = θ argmin J ( θ ; y ) = p ( y ∣ θ ) p ( θ ) = p ( y ∣ θ ) p ( θ ∣ D ) Inference Method ¶ To keep things simple, I will use some optimization method which simply minimizes the objective function.
# initialize parameters
params_init: PyTree = params
num_iterations: int = 1_000
# optimize parameters
params = minimize_objective(
objective_fn,
params_init,
num_iterations
)