For these methods, we use a highlevel optimizer for solving unconstrained problems.
In particular, we use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm Fletcher, 2000 [wiki].
It is a higher level optimization scheme which uses the Hessian matrix of the loss function, in this case it is the negative log-likelihood loss.
It is chosen because it offers very fast convergence for nonlinear optimization problems where higher level gradient information is available.
We put some constraints on the parameters.
The mean and shape parameters are allowed to be completely free however, the scale parameter is constrained to be positive.
We put some prior distributions on the parameters.
The mean and shape parameters are allowed to be completely free however, the scale parameter is constrained to be positive.
The estimated parameters for the μ are estimated directly from the data by calculating the mean and standard deviation.
We use the same estimated parameter