# Model Selection for Local Projections Instrumental Variable Methods - Empirical Application to Government Spending Multipliers.

*Chrysoula Karapanagioti*
Link

## Abstract

This thesis tackles the problem of model selection for a single equation estimation method named Local Projection with Instrumental variable. Regularization techniques that choose the model and estimate the parameter concurrently are used to estimate the impulse response functions, which is especially beneficial in vector autoregressive contexts. The main focus is upon the Desparsified Lasso method. A simulation research as well as an empirical investigation to government expenditure multipliers illustrate the usefulness of the processes in terms of forecasting and model development. In addition, the main results indicate that the Desparsified Lasso produce much closer estimated impulse responses both in DGP and empirical analysis.

This paper explores how regularization methods can be used in concert with VARs and local projections to estimate impulse responses in high-dimensional contexts. Importantly, the author claims that these estimators incorporate the temporal dynamics and spatial dependence that VARs inherently provide. Thus, we’re not necessarily losing anything or estimating the wrong estimand by using these methods, which is a worry I’ve had.

Recent advances in the literature have resulted in strategies that use regularized estimating techniques to choose variables in VARs. The high-dimensional VAR model is estimated using this method. Specifically, a selection of the regularization models developed by Nicholson et al. (2020) like Componentwise HLAg, Lag-Weighted Lasso, Disparsified Lasso by van de Geer et al. (2014) , Lasso by Tibshirani (1996), and Post Double Lasso by Chernozhukov, Hansen, and Spindler (2015) are used. These models incorporate relevant information (temporal dynamics and spatial dependence) that VARs inherently provide.

Alternatives for reducing the dimensionality of VAR models include:

- Bayesian VARs
- Principal component analysis
- Dynamic factor models

The author goes right out and acknowledges that VARs with the same number of lags for every variable might not be ideal, which is something I’ve always wondered about. @rameyMacroeconomicShocksTheir2016 makes a similar point when she notes that control variables don’t need to be the same for each horizon of a local projection. Maybe that’s not exactly the same point, but there’s a similar line of thinking running through both:

Transclude of @rameyMacroeconomicShocksTheir2016#d3cf71

## Desparsified Lasso

The point of desparsifying lasso estimates (as in Local Projection Inference in High Dimensions) is to obtain “uniformly reliable inference” which seems to be a fancy way of saying unbiased estimates of the coefficients. Desparsifying effectively undoes the coefficient shrinkage.

I don’t totally understand the point of using Lasso in the first place if you then desparsify it. It must be the case that not all variables are unshrunk or rather they are not all *fully* unshrunk. I guess the idea is that the standard Lasso emphasizes predictive performance rather than unbiasedness of the coefficients, and so the coefficients can be far from their true values if that helps the regression. You might want to undo this effect. I would assume from the name that what happens in practice is that many formerly zeroed out coefficients come back with small absolute values.

The thinking seems to go – we use Lasso because we have too many covariates, but we pay the cost of bias in the coefficients. So we undo the sparsification (presumably on only the variables that weren’t shrunk all the way to zero).

Judging from the desparsifying equation, it seems like you take the lasso betas and add back something the roughly tracks the degree to which a predictor is correlated to the residuals, which is possible because we aren’t using OLS. This correlation is then discounted for how variable the residualized predictor is.

## Lag-Weighted Lasso

Standard lasso is totally unstructured in the sense that all predictors are treated equally. One could imagine wanting to treat different coefficients differently based on prior beliefs. For example, if the regression includes lags, you might think that more recent lags are more important than further away lags, which should be more heavily penalized.

## Group Lasso

Really interesting note that if you apply ridge regression but group the variables in some way and then apply the penalization to the sum of the L2-norms, the estimator will actually pick or drop entire groups, even though ridge regression doesn’t normally reduce coefficients to zero. By penalizing the sum of L2-norms, you achieve parameter selection.

## Results

Desparsified Lasso performs the best in the simulated examples, but it’s pretty hard to tell this from the tables.

It’s outperformance is more clear in the empirical example (from @rameyGovernmentSpendingMultipliers):

This guy’s write up of the results is terrible.

The big takeaway from this paper is that **it’s probably worthwhile to improve my understanding of the desparsified / debiased lasso.** It’s doing something interesting and different from both the standard lasso and the post-(double) lasso estimator (HIGH-DIMENSIONAL METRICS IN R, @belloniInferenceHighDimensionalSparse2011a). It’s clearly trying to achieve something similar to post-lasso, which is unshrinking the coefficients, but in the case of desparsified lasso it’s unshrinking potentially all the coefficients while post-lasso only unshrinks the coefficients that were selected in the first place. Standard lasso fails because it struggles to handle both bias and variable selection at the same time.

These would be good places to start:

- Lasso Inference for High-Dimensional Time Series
- Local Projection Inference in High Dimensions
- On asymptotically optimal confidence regions and tests for high-dimensional models
- Confidence intervals for low dimensional parameters in high dimensional linear models

## Interesting tidbits

- Assuming that the variables are all already stationary, adding lags of the outcome to the RHS helps reduce standard errors.
- Invertibility in VARs implies that structural shocks can be recovered from the data, which is to say non-invertibility means that structural shocks cannot be recovered from the data. Whenever you hear “invertibility” you should simply think “the structural shocks can be identified.”

## References

Subset selection for vector autoregressive processes using Lasso