The Bayesian Lasso
Trevor Park, George Casella – 2008
Abstract
The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (i.e., double-exponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors for the regression parameters and independent exponential priors on their variances. A connection with the inverse Gaussian distribution provides tractable full conditional distributions. The Bayesian Lasso provides interval estimates (Bayesian credible intervals) that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso parameter. Slight modifications lead to Bayesian versions of other Lasso-related estimation methods, including bridge regression and a robust variant.
Summary
TK
The nice part about a fully Bayesian approach to Lasso is that you don’t have to worry about doing cross-validation. You just set your priors and you’re ready to go. Even the parameter in the Laplace coefficient prior can be set via maximum likelihood.
Hierarchical priors
- Prior on coefficients,
- Prior on variance of coefficients,
- Prior on the Laplace parameter,
Priors
Prior on coefficients, (conditional Laplace distribution): here is equivalent to , where b is the scale parameter of the Laplace distribution. So in a way is the inverse variance / precision.
Prior on variance of coefficients, (non-informative scale-invariant):
Note that it’s presumed the data has been standardized, so the above is technically scale-invariant. Also note that this kind of prior is very similar to an exponential, one-sided Laplace, or Gamma distribution with shape parameter = 1.
Prior on the Laplace parameter, (note , not ): with for most use cases and . This is a Gamma distribution. If it actually becomes an exponential distribution.
Estimation
The authors propose a Gibbs sampling approach to estimating the Bayesian Lasso, which exploits the fact that the Laplace distribution can be represented as a mixture of normal distributions.