The Bayesian Lasso

Trevor Park, George Casella – 2008

Abstract

The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (i.e., double-exponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors for the regression parameters and independent exponential priors on their variances. A connection with the inverse Gaussian distribution provides tractable full conditional distributions. The Bayesian Lasso provides interval estimates (Bayesian credible intervals) that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso parameter. Slight modifications lead to Bayesian versions of other Lasso-related estimation methods, including bridge regression and a robust variant.

Summary

TK

The nice part about a fully Bayesian approach to Lasso is that you don’t have to worry about doing cross-validation. You just set your priors and you’re ready to go. Even the $λ$ parameter in the Laplace coefficient prior can be set via maximum likelihood.

Hierarchical priors

Prior on coefficients, $β$
Prior on variance of coefficients, $σ^{2}$
Prior on the Laplace parameter, $λ$

Priors

Prior on coefficients, $β$ (conditional Laplace distribution): $p (β ∣ σ^{2}) = \prod_{j = 1}^{p} \frac{λ}{2 σ ^{2}} e^{- λ ∣ β_{j} ∣/ σ^{2}}$ $λ$ here is equivalent to $σ^{2} / b$ , where b is the scale parameter of the Laplace distribution. So in a way $λ$ is the inverse variance / precision.

Prior on variance of coefficients, $σ^{2}$ (non-informative scale-invariant): $p (σ^{2}) = 1/ σ^{2}$

Note that it’s presumed the data has been standardized, so the above is technically scale-invariant. Also note that this kind of prior is very similar to an exponential, one-sided Laplace, or Gamma distribution with shape parameter = 1.

Prior on the Laplace parameter, $λ$ (note $λ^{2}$ , not $λ$ ): $p (λ^{2}) = \frac{δ ^{2}}{Γ ( r )} (λ^{2})^{r - 1} e^{- δ λ^{2}}$ with $r = 1$ for most use cases and $δ > 1$ . This is a Gamma distribution. If $r = 1$ it actually becomes an exponential distribution.

Estimation

The authors propose a Gibbs sampling approach to estimating the Bayesian Lasso, which exploits the fact that the Laplace distribution can be represented as a mixture of normal distributions.

References

paper read online

Nnamdi's Notes

All Notes

The Bayesian Lasso

Priors

Estimation

References

Backlinks

All Notes