Skip to content

Jensens Inequality

This theorem is one of those sleeper theorems which comes up in a big way in many machine learning problems.

The Jensen inequality theorem states that for a convex function f,

\mathbb{E} [f(x)] \geq f(\mathbb{E}[x])

A convex function (or concave up) is when there exists a minimum to that function. If we take two points on any part of the graph and draw a line between them, we will be above or at (as a limit) the minimum point of the graph. We can flip the signs for a concave function. But we want the convex property because then it means it has a minimum value and this is useful for minimization strategies. Recall from Calculus class 101: let's look at the function f(x)=\log x.

We can use the second derivative test to find out if a function is convex or not. If f'(x) \geq 0 then it is concave up (or convex). I'll map out the derivatives below:

f'(x) = \frac{1}{x}$$ $$f''(x) = -\frac{1}{x^2}

You'll see that -\frac{1}{x^2}\leq 0 for x \in [0, \infty). This means that \log x is a concave function. So, the solution to this if we want a convex function is to take the negative \log (which adds intuition as to why we typically take the negative log likelihood of many functions).

Variational Inference

Typically in the VI literature, they add this Jensen inequality property in order to come up with the Evidence Lower Bound (ELBO). But I never understood how it worked because I didn't know if they wanted the convex or concave. If we think of the loss function of the likelihood \mathcal{L}(\theta) and the ELBO \mathcal{F}(q, \theta). Take a look at the figure from

drawing

Figure: Showing

Typically we use the \log function when it's used

Resources * Computational Statistics * Blog * Sleeper Theorems * DIT Package * Ox Educ - Intuition | Proof * MIT OpenCourseWare - Intro Prob. | Inequalitiese, Convergence and Weak Law of Large Numbers